A Key to PHARMACEUTICAL AND MEDICINAL ... - ACS Publications

A survey of the literature sources, and a method for coding the ... 4. A simple method for sorting functional groups in an elementary manner. 5. A dec...
0 downloads 0 Views 2MB Size
Literature Sources of Mammalian Toxicity Data, with Special Emphasis on Tabulating Machinery Applications W. J. WISWESSER

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

Willson Products, Inc., Reading, Pa.

A Chemical Toxicity Registry has been developed for the pur­ pose of cataloging and reproducing mammalian toxicity data at low cost with standard punched card tabulating equipment. This report summarizes: 1. A survey of the literature sources, and a method for coding the references. 2. A concise method for coding, tabulating, and cor­ relating the toxicity data. 3. A simple method for linearly describing the cor­ responding structures of the chemicals with the 37 "teletype" symbols of standard tabulating equip­ ment. 4. A simple method for sorting functional groups in an elementary manner. 5. A decimal method for coding the formula index and ring index values according to five natural numeric measures. 6. A simple method for supplementing the above chemi­ cal identifications with a short name or coded sys­ tematic name, based on established word roots.

When the industrial hygienist is asked what type of health protection should be provided i n some exposure to a new chemical, determination must be made as to whether the chemical is dangerously toxic or only slightly toxic. Thus, i n both the industrial and medical fields, biochemical toxicity data are as important as the physical chemical data on vapor pressures, transition temperatures, solubili­ ties, and reactivities. In answering such questions, the hygienist usually turns to the literature to find what has been reported about the health hazard. U n ­ fortunately, data of this type recorded i n the literature must be used with care because no standardized method or basis for reporting data exists. Consequently, much must be done i n the standardization and distribution of this fundamental information. In 1953, Chemical and Engineering News (2) announced a commendable new N a t i o n a l Safety Council program to evaluate the handling and shipping hazards of new chemicals w i t h a minimum of five standard­ ized toxicity tests. Standardization also is needed for the efficient storage and retrieval of a l l kinds of data relating to specific chemicals. Indexing is becoming mechanized (3), but the abstracting and cataloging cost is so great that the task cannot be attempted unless the information is shared through pooled research efforts (10, 13). The author is convinced that satisfactory standardization can be achieved i n the immediate future i f the problem is given the proper scientific attention. E a r l y i n 1953, a simple "checkoff" questionnaire was sent to a number of pharmaceutical and toxicological laboratories, i n a systematic effort to obtain majority opinions from these authorities on four basic questions : 1. W h i c h journals are most widely regarded as literature sources containing — o r leading to—mammalian toxicity information? (answered i n Table I ) . 64

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N TOXICITY

DATA

65

2. W h i c h specific toxicity measures are most widely favored as standards? (answered i n Table I I ) . 3. W h i c h of the many kinds of structure, measures seem most promising for classification and correlation purposes? (answered i n Table I I I ) . 4. W h i c h types of concise identifications f o r chemical compounds are favored i n systematic toxicity tabulations? (answered i n Table I V ) .

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

F o r many years the author has been interested i n locating, recording, storing, retrieving, and correlating toxicity data through the use of systematic measures and mechanical aids. The present paper describes a system employing punched cards adapted for visual and mechanical manipulation, which has been developed as the result of many experimental refinements. I n this system three main series or decks of cards are prepared : 1. "Chemical identification" cards (one for each compound) on which are recorded : a. The chemical structure, written i n the Wiswesser notation, b. A short name identification, c. Biochemical and physical data, d. The functional groups present i n the compound, e. A " f o r m u l a file number" based on the compound's empirical formula, and /. The literature reference i n terms of year of publication, author, and journal. I f more than one reference is recorded, additional cards are provided on which items b, c, and d are replaced by as many as eight year-author-journal references. 2. "Reference identification" cards (one for each reference) on which are recorded : a. Name of author (s), b. Abbreviated title or subject, and c. The literature source i n terms of journal, year, volume, page, and author's initials. The year-author-journal marks permit correlation of a specific reference card w i t h one or more "chemical identification" cards. 3. " J o u r n a l identification" cards (one for each journal) on which are recorded: a. The title of the journal, book, or other literature source, b. The address of the publication, i n terms of city, zone, and state (or country i f not United States), and c. A two-letter code to indicate the literature source. This code permits correlation by machine w i t h the described "reference identification" and "chemical identification" cards. Ordinarily, searches i n terms of chemical structure, property, or name are conducted on the "chemical identification" cards. When the desired cards have been located, the appropriate "reference identification" and " j o u r n a l identification" cards are selected mechanically, making use of the common code. In case the number of selected cards is large, the three types of cards are correlated and sequenced—also by high-speed machine methods. The information sought may be taken manually directly from the cards, or the cards may be fed into a tabulating machine to l i s t the information i n any preferred arrangement (at the rate of 9000 cards or complete lines of information per hour). If multiple copies are desired, these can be r u n from Ditto masters made at the same rate i n the tabulating machine, w i t h Ditto tabulator carbon. The layout of these cards, the technique of recording or searching, and other comprehensive uses of these cards are described i n subsequent sections of this paper. Toxicity Literature

Current sources of toxicity data (periodicals) are listed i n Table I, which includes the suggestions and revisions obtained through the author's questionnaire. In this list, the current sources are arranged and identified by means of a twoletter code, referred to hereafter as the J-Code—i.e., Journal Code. M a i l i n g addresses and other detailed information on these sources were obtained from a comprehensive current list (8), prepared i n 1953 b y the Industrial Hygiene Foundation, at the Mellon Institute. The abbreviations used i n that list "conform, with some modifications, to those used by Chemical Abstracts," and the list is arranged i n the alphabetic order of the abbreviations. This same a r ­ rangement is followed i n other comprehensive lists (11), so this majority preference A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

66

ADVANCES

IN CHEMISTRY SERIES

also is followed i n Table I. F o r example, the abbreviated letters i n Ind. Hyg. Digest come before Index Medicus, even though the represented word " I n d u s t r i a l " would follow "Index."

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

Table I.

AA AE AI AM AQ AU AY BB BF ΒJ BN BR BV CC CG CK CO CS CW CZ DD DH DL DP DT DX ΕA EE EI EM EQ EU EY FB FF FJ FN FR FV GC GG GK GO GS GW GZ HD HH HL HP HT HX ΙΑ IE II IM IQ IU

Current Sources of Toxicity Data

a

(Illustrating the Two-Letter

Can. J . Comp. Med. IY A c t a Med. Scand. JB Can. J . Med. Sci. A c t a P h a r m l . Tox. JF Can. J . Res. (Tec.) A c t a Physio. Sen. JJ CAN. MED. A c t a Radiol. ASSN. J . A g r . Chemicals JN CANCER A g r . & Food Chms. RESEARCH A m . Heart J . JR CHEM. A m . Ind. H y . A n . Qt. ABSTRACTS A m . J . Clin. Path. JV Chemical Age A m . J . Dis. C h l d r n . KC Chem. E n g g . A m . J . Medicine KG Chem. E n g . News A m . J . Med. Sci. KK Chem. Products A m . J . Obst. Gync. KO Chem. Safety D. A m . J . Ophthalmol. KS Chem. Trade J . AM. J. PATHOL­ K W Chemisch. Zentr. OGY KZ Chemistry & Ind. A m . J . Pharmacy A m . J . Physiology LD Circulation LH Circ. Research A m . J . Pub. Health LL Compens. Med. A m . J . Roent. R a . T . COMPT. R E N D . A m . J . Trop. Med. H . L P LT Com. ren. soc. bio. A m . J . Vetnry. Res. L X Cornell Vet. A m . Rev. Tuberc. MA C U R R E N T LIST M. Anat. Record ME C u r r . Med. Digest Angiology MI Deut. med. Wochsr. ANN. INTERN. M M Diseases of Chst MED. M Q Endocrinology A n n . pharm. franc. Antibio. & Chemo. MU EXCERPTA MEDICA A r c h . Bioch. Biop. M Y E x p t l . Cell Res. A r c h . Derm. Syph. NB Federatn. Proc. Arch. E x . Path. Ph. NF F o l i a Haematol. A r c h . f . ges. Phys. NJ F o l i a medica ARCH. IND. NN G. B r i t . H . M . S. 0 . Η. 0 . M . NR Gigiena i Sanit. A r c h . Intern. Med. NV Helv. Med. A c t a A r c h . int. pharmd. oc Ind. E n g . Chem. A r c h . int. physio. OG Ind. Health B . N J A r c h . mal. profes. OK Ind. Health Mo. A r c h . Pathology 00 Ind. Health Rev. Assn F & D Offls. OS IND. H Y G . Atomic E n . Comm. DIGEST Biochim. biophys. OW IND. M E D . & Biochem. J SURG. Biochem. Zeit. Ind. Data Sheets BIOL. A B S T R A C T S o z PD INDEX MEDICUS Blood PH J . Allergy Bol. soc. it. bio. BRIT. A B S T R A C T S P L J. A M . MED. ASSN. B r i t . J . Cancer Br. J . Exp. Path. PP J. AM. PHARM. ASN. BRIT. J . IND. M E D . PT J. A M . V E T . MD. Brit. J . Nutritn. BRIT. J . AN. PX J . BIOL. C H E M . PHARMCL. QA J . Gen. Physiol. B r i t . J . Ven. Dis. B r i t . Med. B u i . QE J . Hygiene BRIT. M E D . J . QI J . Lab. C l i n . Med. B u l l . Hygiene QM J . Nutrition J. PATH. BACTL. B u i . Johns H p . H . QQ J. P H A R M A C Y QU B u i . mem. soc. md. & P. B u l l . Soc. chim. F . Names of most widely consulted journal or abstract

J-Code)

J . P h a r m . E x . Thp. J . Physiology J . Sci. Food A g r . J . Urology K l i n . Wochschr. LANCET Med. B u i . S O / N J Med. J . A u s t r a l i a Med. K l i n i k Med. lavoro Med. Res. Lab. Rep. M f g . Chemist M i n e r v a Medica Minnesota Med. sz Modern Sanitn. TD Monthly Rev. Ν. Y . TH Nat. Nuc. E n . Ser. TL Nat. Safety News TP Nature TT N E W ENG. J. TX MED. Ν. Y . State J . Med. UA Nord. H y g . Tidskr. UE N u c l . Sci. Absts. UI U M Occpl. Health Ohio Ind. Com. M r . UQ Ohio St. Med. J . UU P a . B u r . Ind. H y g . UY Perfumer VB Pest Control VF Pharmcl. Revs. VJ Physics Today VN Practitioner VR P r . A m . Vet. M d . A n . VV W C P r . Soc. E x . B i . M d . W G P r . Soc. St. In. M d . WK PUB. H E A L T H REPS. WO QRT. J . E X . PHYSIO. WS Rev. Tuberculose W W Sammlung. Vergft. Schweiz. med. Wo. WZ Science XD Semaine hop. P a r i s XH So. A f r . Med. J . XL Soap Sanit. Chem, XP Southern Med. J . XT Souibb Abs. B u l l . XX SUMMARY TAB., YA NRC Tex. Rep. B i o l . Med. YE T r . A s n . Ind. Med. 0 . YI Y M Trans. Nat. T B A s n . U . S. A r m e d F . M . J . YQ U . Cal. Pb. P h a r m . YU Wien. K l i n . Woch. YY Z. ges. in. Med. G. ZC Z. physiol. Chem. ZG Z. U n f allmed. B k . ZK Zent. Arbeitsmed. ZO Zent. f. C h i r . ZS sources are capitalized. QY RB RF RJ RN RR RV SC SG SK SO SS SW

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N TOXICITY

67

DATA

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

Journal Identification Cards. The full titles and mailing addresses of the sources listed i n Table I are best elaborated on a master set of just a few hundred " j o u r n a l identification" cards. These have a layout as suggested i n the examples below; a separate card is used to record the data on each horizontal line, such as the " t i t l e l i n e " itself. E a c h mark, space, or dot represents one of the 80 columns available on the I B M information-carrying cards. The blank spaces here denote p r i n t i n g spaces (on the cards and i n tabulations) that do not require corresponding separating blanks i n the punched columns of the cards—for example, between columns 43 and 44, or between columns 49 and 50. Punched column recording capacity is never wasted to separate the printing of adjacent fields. Thus, suc­ cessive lines of information—i.e., cards—appear as follows, when sorted by the J-code i n columns 48 and 49. COLUMN 1 2 3 4 5 6 7 8 123456789012345Ô789012345678901234567890123 45 67 89 0123456789012345678901234567890 .TITLE OF PERIODICAL IN 40 OR LESS MARKS.V. Y. A. J . CITY AND ZONE SYMBOL..STATE...# ACTA MEDICA SCANDINAVICA

AA STOCKHOLM Κ

SWEDEN....

ACTA PHARMACOLOGIC A ET TOXICOLOGIC^.

AE COPENHAGEN Κ

DENMARK.. «

AGRICULTURAL CHEMICALS

INDUSTRY PUBLNS

. . AQ NEW YORK 1

NEW YORK. ·

BB CHICAGO 11

ILLINOIS. ·

AMERICAN J OF CLINICAL PATHOLOGY

BF BALTIMORE 2

MARYLAND. ·

ANNALS OF WESTERN MEDICINE AND SURGERY

DZ LOS ANGELES 5

CALIFORNIA

ARCHIVES BELGES DE MED SOCIALE HYG ETC

EC BRUSSELS

BELGIUM. . »

AMERICAN INDUSTRIAL HYGIENE ASSN QUARTERLY.

..

ARCHIVES OF INDUSTRIAL HYGB AND OCCPL MED.2 5 0 . . . EU CHICAGO 10. J OF INDUSTRIAL HYGIENE AND TOXICOLOGY.. .31 49 . . Q2 . . S E E 50

ILLINOIS..

EU

The title of the periodical, as shown by the leading card, is given i n the first 40 columns; this field size is sufficient to permit the vast majority of names to be spelled out i n f u l l . A few obvious abbreviations like " A M E R " and " J " are justified by their extensive usage. The journal code from Table I is punched i n columns 48 and 49 (in the " J " space). The source address is confined to a 20column " c i t y and zone" field (columns 50 through 69) and a 10-column "state or n a t i o n " field (columns 70 through 79) ; the information punched into these last 30 columns is printed most conveniently (by the I B M "interpreter") on the second p r i n t i n g line of these j o u r n a l identification cards. (Standard I B M tabulators also can be wired to list two-line sets i n a similar manner, w i t h columns 44 through 49 also on the second line.) New titles are added w i t h reserve letter combinations such as D Z and E C i n the above list of examples; and old titles no longer i n use can be included as letter-number combinations, such as the Q2 example. Optional volume-year correlations are shown i n columns 41 through 45 on these cards. F o r example, the printed figures can show the number of volumes per year and the first year of publication of a new journal—as i n the E U example — o r the last volume number and last year of publication of a discontinued one — a s i n the Q2. The last column 80 ( # ) is reserved to distinguish the different decks or sets of cards—e.g., the j o u r n a l identification, reference identification, or chemical identification cards—and the succession of additional cards (first, second, or third) when more than one is necessary to record the f u l l information. I n most cases more than enough space is provided on a single card, and this reserve capacity can be used for supplementary information. F o r example, the 40-column field for the journal name is sufficient to show the following helpful detail: OCCUPATIONAL H E A L T H . . U S GOVT PRINTG OFC This principle of providing economic use for reserve capacity also is illustrated i n the "reference identification" and "chemical identification" cards. A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

A D V A N C E S IN

68

CHEMISTRY SERIES

Reference Identification Cards. A few hundred cards suffice to identify the journal and text sources of toxicity data, but references to specific articles may r u n into the thousands i n a master catalog. Hence a l l of the literature source information is made available for many kinds of punched card correlations (by journal, date, author, subject, etc.) through the inclusion of a distinct set of "reference identification" cards. Specific j o u r n a l articles are identified on these cards, w i t h authors' names, page-volume numbers, the year-author-journal codes, and short title or subject identifications (see preceding comments on journal identification cards) :

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

COLUMN 1 2 3 4 5 6 7 8 1234567890123456789012345678901234567890123 45 67 89 0123456789012345678901234567890 .LAST NAMES AND INITIALS OF AUTHORS.PAGE..V T . A.

J . TITLE OR SUBJECT

IDENTIPICATN.#

BRANDT A D...MD C0NNELL AND PLINN..00132.61 46 AB WK C0MP0SN OF TRADE NAME SOLVENTS COOK WARREN A

00936..7 45 WC 0W SUMMARY OF M A C VALUES

LEHMAN A J . . . CHEMICALS IN FOODS

00047.16 52 AL FR SUBACUTE AND CHRONIC TOXIC IT IE

LEHMAN A J . . .CHEMICALS IN FOODS... .00122.15 51 AL FR PESTICIDES AND TOXICITIES MC CORMICK m

Ε

00038.13 52 WM BB CHEMICALS IN RUBBER PRODUCTS..

MC LAUGHLIN R S . . 5 0 0 CASE REPORTS..01355.29 46 RM CC EXE BURN CASES FROM 180 SLOAN KETTERING

INST FR CANCER RES.00376..4 52 SK YA INTRAPERIT TOLERANCES

SMYTH H F JR AND C Ρ CARPENTER

CHMLS.

FOR MICE

01363.29 46 HS CC EYE INJURY GRADES ON 180 CHMLS

The first 35 columns of these reference cards are reserved as shown for author (and subtitle, etc.) information, and the last 30 (columns 50 through 79) for subject identification. The latter information is printed (interpreted) on the second p r i n t i n g line of these cards, as w i t h the journal identification cards. More elaborate subject coding could be used if some of the " t i t l e " columns were reserved for that purpose, but the self-evident word descriptions seem to be preferred i n these cards. F i v e columns (36 through 40) are reserved for the page number, and the next three (41 through 43) for the volume number. Here again the blank spaces i n the leading card denote printing spaces that separate adjacent punching columns. E a c h specific journal reference is identified by a unique six-column code (columns 44 through 49), composed of three pairs of symbols: two digits which directly identify the year number, two letters which generally give the author's initials, and the two " j o u r n a l code" letters from Table I. Text and pamphlet references also can be included i n this six-column code. Books are distinguished by a blank space i n the fifth position (column 48), followed by the publisher's i n i t i a l i n the last position. Miscellaneous pamphlets are dis­ tinguished by a blank space i n the third position (column 46), followed by a three-letter identification of the issuing organization-, such as A C S , N S C , N R C , P H S , S P B etc. These three-letter organizational identifications can be elaborated on " j o u r n a l identification" cards. Chemical Identification Cards. A widely applicable t h i r d set of cards comple­ ments the above two sets by summarizing the tabulated physico-biochemical data for each specific compound. If the information on a chemical card is summarized i n a single reference, the corresponding reference code is given i n columns 44 through 49, exactly as described above. Otherwise, a multiplicity of reference code numbers is given on a chemical card having " R c a r d " printed i n the reference field (columns 44 through 49). One such " R c a r d " can c a r r y up to eight reference codes (for a simple chemical structure) i n addition to the complete chemical no­ tation and a formula-ring-indexing " s e r i a l number" identification (see " F o r m u l a Index N u m b e r s " ) . This multi-reference layout for a chemical card is illustrated below, w i t h others. The chemical card layout is illustrated below, i n terms of the field identificaA Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N T O X I C I T Y

69

DATA

tions f o r structures w i t h increasing degrees of complexity (increasing length of notation). The 80 consecutive punching columns are identified across the top by adjacent numerals, without the p r i n t i n g (interpreting) spaces that are pro­ vided after columns 43, 45, 47, and 49. A s i n the preceding two sets of j o u r n a l and reference cards, the printable information i n the first 49 columns is " w i r e d " to be printed across the top p r i n t i n g line of the c a r d ; the remaining printable information appears on the second line. COLUMN 1 2 3 4 5 6 7 8 1234567890123456789012345678901234567890123 45 67 89 0123456789012345678901234567890

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

.NOTATION

VAPOR DATA

SHORT NAME...

T . A . J . xxxx.WEIGHT DATA

FORMULA..#

.NOTATION FOR NONVOLAT I L E S . . . .SHORT NAME...

Y. A . J . xxxx.WEIGHT DATA

FORMULA..#

.NOTATION FOR UNNAMED COMPLEX STRUCTURES...

Y. A. J . xxxx.WEIGHT DATA

FORMULA..#

.NOTATION TO HERE ON 1ST CARD.SHORT NAME...

Y. A . J . xxxx.WEIGHT DATA

FORMULA..*

.NOTATION FOR VERY COMPLEX STRUCTURES CAN GO TO COLUMN 70 ON 2ND CARD .NOTATN REF.#8 REF.#7 REF.#6 REF.#5 REF.#4

FORMULA. ·

Μ

R CA RD REF.#3 REF.#2 REF.#1 FORMULA..*

The first 14 columns suffice to denote the structures of a l l appreciably volatile compounds, which must be relatively simple molecules, i n terms of the systematic notation discussed i n "Chemical Structure Notation." [ W i t h this line-formula no­ tation, Smith (12) showed that only ten or less columns are necessary to describe 5580 of the 7105 most common chemicals—those listed i n the Hodgman (7) and Lange (9) handbooks. Only six of these 7105 notations require more than 30 col­ umns; and the longest notation (Lange No. 3732) requires only 41 columns (12)Λ Next, 16 columns (15 through 30) are reserved for vapor data such as the critical thermochemical constants, or the vapor toxicity ratings described i n " T o x i c i t y R a t i n g s " ; hence the first 30 columns are reserved f o r the notation of nonvolatile compounds that also have a short name. Short names, or the shortened systematic names described i n " S h o r t Name Identification" are carried i n the next 13 columns (31 through 43) with occasional use of the preceding columns f o r prefix marks; thus, the first 43 columns can be wired directly to the corresponding alphabetic tabulator positions. Notations f o r unnamed very complex structures can, of course, continue to the 43rd punching column, as shown i n the above symbolized examples. F o r those rare "one i n a thousand" compounds that have a very long structure description and a short name, a second chemical card is provided (and distinguished w i t h its first part by a "zone" punch i n column 80). T h e first 30 marks of the notation repeat on this card, and the notation continues as f a r as necessary, to the ultimate second card limit at column 70; the remaining nine columns (71 through 79) are reserved f o r a unique " s e r i a l number" identification which i n ­ corporates formula-indexing numbers (in " F o r m u l a Index N u m b e r s " ) . These extra cards, both f o r the notation and for multiple references, are interpreted without any blank spaces after columns 43, 47, etc., since blank spaces w i t h i n the notation have specific meaning. The field-overlapping principle that is illustrated above also can apply i n ­ dividually i n a l l cases that require a second card. F o r example, the longest notation among 7105 handbook compounds (Lange No. 3732) requires 41 columns and has a short name identification (Fast Red D ) ; therefore the second card can carry the complete notation and a l l other data except the short name. Since a l l such second cards are identified uniquely w i t h the first ones through the " s e r i a l number" i n columns 70 through 79, no extra effort is necessary to keep these card pairs adjacent. Column 80 carries two sets of punches: the numeric punches identify any one of nine different sets of cards, such as journal, reference, or chemical cards, and successive additions i n the last set; the "zone" punches identify the first, second, or t h i r d card of its kind, f o r any multi-card entry w i t N n any set. A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

70

A D V A N C E S IN CHEMISTRY

SERIES

While the punched cards are designed to be the p r i m a r y information carriers, a great deal more can be done w i t h them i n an information center that has access to I B M accounting machinery. F o r example, a committee might want a few dozen Ditto copies of a list that summarizes a l l the references on just one chemical or one group of chemicals (selected, perhaps, by several independent sorting searches). In one rapid machine operation, the collected Reference Identification cards can be merged w i t h the J o u r n a l Cards i n such a w a y that each different " J - C o d e " among the Reference Identification cards is followed by that J o u r n a l Identification card. This merged collection of cards could " i n s t r u c t " the tabulating machine to p r i n t the desired Ditto-masters at the rate of 9000 lines or cards per hour. Similar lists can be made for any sequenced sets of cards. I f the Reference Identification cards followed some sequence other than the " J - C o d e " letters, these letters could be wired to appear i n any two letter-printing columns and the entire list of the explanatory J o u r n a l Identification cards could be issued separately. W i t h standard I B M tabulators the only limitation is that the alphabetic (or letterand-number) information must be printed i n the first 43 columns of the list, since the remaining columns p r i n t numbers only. However, a l l three of these journal, reference, and chemical card layouts are designed w i t h this limitation i n mind, generally w i t h the result that the pertinent alphabetic information never extends beyond a 43-column field. A f t e r the l i s t i n g operation, the cards can be sorted on column 80 for return to their respective catalogs. I f the project were a continuing one, the selected cards first could be duplicated and kept i n the listed order, ready to receive any occasional additions f o r future annual or monthly revisions. This reference-listing example illustrates just one of the countless benefits (many of them unpredictable) that can be obtained from a punched-card cataloging investment. Table II.

Preferences for

( F i g u r e s i n parentheses

A.

Various

Kinds of

Toxicity-Coding Measures

denote the percentage of toxicologists w h o replied to this a n d favored the indicated measure)

Definition of Toxicity Rating V o l a t i l e hazards, measured as parts p e r m i l l i o n (p.p.m.) by volume C o n c e n t r a t i o n l e t h a l to a n y a n i m a l i n 5 t o 10 minutes N a r c o t i c concn. for! rats ( 1 1 ) , mice ( 7 ) , other a n i m a l s (4) I n h a l a t i o n LD f o r rats ( 2 4 ) , mice ( 1 4 ) , dogs (4) after 4 h r . Tolerable, LD , f o r rats ( 7 ) , mice ( 7 ) , others (7) f o r 1 h o u r Threshold l i m i t ( M . A . C . , m a x . allowable concn.) 8-hr. daily Threshold l i m i t , i n m g . p e r cubic meter of a i r Doses expressed as m g . p e r k g . of a n i m a l body w e i g h t S i n g l e o r a l LD , rats ( 3 5 ) , mice ( 2 5 ) , dogs ( 1 1 ) , cats (7) S i n g l e i n t r a p e r i t o n e a l LD , f o r rats ( 2 9 ) , mice (29) D a i l y i n t r a p e r i t o n e a l LD , f o r rats ( 1 4 ) , mice (29) Single d e r m a l LD , fori rabbits ( 3 2 ) , mice ( 1 1 ) , rats (7) D a i l y o r a l LD , tolerated b y rats (39) o r mice (14) f o r p e r i o d of 1 mo. ( 4 ) , 3 m o . ( 7 ) , 6 mo. ( 7 ) , 12 m o . ( 4 ) , 18 m o . ( 4 ) , 24 mo. Feed tolerance, as w e i g h t p a r t s p e r m i l l i o n of feed Safe l i m i t , LD after 1 m o n t h , f o r rats (25) o r mice (4)

question

Total Favorability,

m

Q

B. C.

75 57 21 54

50

m

0

50

Q

D.

%

21 18 42 21 57 29

(7)

0

50 32

Toxicity Ratings

Toxicity determinations require costly laboratory investments, so correlation of scattered information is desirable as much as possible, but the data have neg­ ligible correlating value unless the test conditions are standardized. F o u r essential test specifications are method of administration, test species, criterion of toxicity, and definition of measured units. M a n y possible variations must be reduced to a small number of standard combinations i f the information is to be coded and correlated. Thus, the National Safety Council plan (3) employs five standardized toxicity tests to evaluate shipping hazards w i t h a safe minimum of testing i n ­ vestment. Table II summarizes the preferences for a dozen different toxicity ratings, as revealed by the responses to the questionnaire. Most of these are further qual­ ified by preferences f o r the test animals, shown by the figures i n parentheses. E v e n when test conditions are specified, imperfect experimental techniques may A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N TOXICITY

DATA

71

cause misleading results. F o r example, a strong minority of the toxicologists objected to intraperitoneal administrations, p a r t i c u l a r l y for repeated doses, be­ cause adhesions or other complications might increase the mortality. The p u r i t y of the test chemicals also should be stated, since lethal quantities of impurities obviously would have their effect. Toxicologists repeatedly have cautioned against deceptive " p r e c i s i o n " i n the LD values, which are misleading because of the inevitable variability of individuals receiving the doses—as well as those g i v i n g them! F o r example, Craver and associates at Ciba summarized many subtle causes of response variations when doses were administered intravenously to rodents (U)> The numeric values used i n this proposed toxicity registry (Tables III and I V ) are based on a very simple grading system developed by Smyth and associates at the Mellon Institute (lb). Their "range-finding" numeric grades have three valuable attributes for punched card notation and correlation: * tabulating m a r k (number or letter) suffices to show the value. 2. The geometric intervals defined by these marks permit simple comparisons among different sets of toxicity ratings (as on logarithmic scales). 3. The magnitude of the intervals—approximate powers of two—properly de­ fines the limit of accuracy for the vast majority of published figures. Most of the toxicologists who replied to the questionnaire acknowledged the need for a standard set of "pure number" intervals such as these, suitable for coding the magnitude of the dose i n parts per million, i n milligrams per cubic meter, or i n milligrams per kilogram of body weight. Single code marks for 30 successive grades of dosage are as follows : Mean Value & Exact Range Mark Mean Value Mean Value Mark Mark 500 500,000 & 0.36-0.69 0.5 # 250 250,000 A 0.18-0.35 0.25 1 J 125 125,000 Β 0.09-0.17 0.125 2 Κ 63 63,000 C 0.045-0.089 0.063 3 L 32 32,000 D 0.022-0.044 0.032 4 M 16 16,000 0.011-0.021 Ε 0.016 5 Ν 8 0.0056-0.010 8,000 F 0.008 6 θ 0.0028-0.0055 4 4,000 0.004 G 7 Ρ 0.0014-0.0027 2 0.002 2,000 H Q 0.0007-0.0013 1 0.001 1,000 I R On the I B M card, the three numerically parallel sets of values are distinguished by the top "zone" punch (denoted as &) and the second "zone" punch (denoted as # ) . Tabulated zero numerals can be distinguished from O-letters with care, but their identities are ensured by " s l a s h i n g " the zeros and " b a r r i n g " the O's. O n the cards the punched positions alone are sufficiently distinctive. Similar grades of dosage values, w i t h a slightly smaller geometric interval of 1.5, were proposed by Deichmann and M e r g a r d (5), while larger intervals were proposed by D r i n k e r and Cook (6). Table III.

(Parenthetical (79) (51) (47) (37) (37) (16) (32)

Types of Structure

(Parenthetical

(39) (56) (26) (13) (39) (4)

Structure Measures Favored for Toxicity Registry

denote the percentage of those w h o replied to t h e question a n d favored t h e indicated measures) Presence of Ν atoms i n compound (74) N u m b e r of C atoms i n compound Presence of S atoms i n compound (68) N u m b e r of H atoms i n compound Presence of a r o m a t i c character (21) N u m b e r of Ο atoms i n compound Presence of H A L O G E N atoms (74) N u m b e r of R I N G S i n s t r u c t u r e Presence of t r i p l e bonds (32) N u m b e r of unsaturations Presence of q u a t e r n a r y atoms (27) N u m b e r of " b r a n c h e d " atoms Presence of M E T A L L I C atoms (ID N u m b e r of f u n c t i o n a l o r a l k a r y l u n i t s Presence of heterocyclic r i n g s (12) i n structure

Table IV.

figures

Identification

Favored for Toxicity Registry

figures denote the percentage of those w h o replied to the question a n d favored t h e i n d i c a t e d measures) One suitable f o r use w i t h edge-notched (hand-sorting) cards. One suitable f o r use w i t h punched cards a n d t a b u l a t i n g m a c h i n e r y . A n agreed n a m e of n o t more t h a n t e n ( 9 ) , or of a n y n u m b e r (4) of letters. A n a t u r a l " c l a s s i f y i n g " n u m b e r of t w o (4) o r a n y n u m b e r (4) of digits. A complete description, i n l i n e a r (17) o r two-dimensional (9) p r i n t i n g , based o n a 40character (13) o r 80-character (4) keyboard, a n d o n " l i n e - f o r m u l a " (22) o r " r o o t - n a m i n g " (4) p r i n c i p l e s . A b i n a r y — p r e s e n t o r absent—identification of " f u n c t i o n a l " groups.

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

ADVANCES

72

IN CHEMISTRY SERIES

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

Table V illustrates the use of the single I B M marks to show toxicity ratings (in parts per million by volume) of some common vapors, along w i t h the line-formula notation (see "Chemical Structure Notation") and the " s h o r t n d " names (see " S h o r t Name Identification"). The seven kinds of toxicity ratings shown i n Table V can be carried i n the columns of the "chemical identification" cards as indicated below: Col. Col. Col. Col. Col. Col. Col.

15, 17, 19, 21, 23, 25, 27,

Concentration lethal to any animal i n 5 to 10 minutes Concentration lethal to man i n 30 to 60 minutes Concentration intolerable to man for 10 minutes Concentration tolerable to man for 1 hour Threshold limit (maximum allowable concentration) I r r i t a n t or nuisance threshold Odor threshold Table V.

(P.P.M. (P.P.M. (P.P.M. (P.P.M. (P.P.M. (P.P.M. (P.P.M.

units) units) units) units) units) units) units)

Sample 43—Column Listing

COLUMN 1 2 3 4 12345Ô7890123456789012345678901234567890123 RATINGS* ..SHORT

NAME...

0U101

5

7 Β

ME

OUI

5

& G

FORMALDEHYDE

ZH

& Β & C

6 7

AMMONIA ACRYLONITRILE

Ε

NC1U1

FORMATE

& Ε

I

CBN DISULFIDE

ses

8

oso

& Ε C Ρ Ρ G

GH

8

C G

FH

8 Β

Ρ G

HYD FLUORIDE

7 D

BENZENE

D

SFR DIOXIDE HYD CHLORIDE

R

5 6

SHH

9 & CA Ε Β J

HYD SULFIDE

0N0

A Β C

D C

NIT DIOXIDE

NCH

A Β D

Ρ Ε I

HYD CYANIDE

ZR WNR

Β G

ANILINE

ΑΙ

NITROBENZENE

Ρ

GG

A C D G I Ε G

CHLORINE

G1V1

&

CHLORACETONE

EE

9 C

11VI

A

GPGG

& C

D

G G I J G

BROMINE I0D0ACETONE

H G J I

P3

CHLORIDE

E1V02

A

F

I

ET

BR ACETAT

GXGGOVG

C D C

J

E1R1

9

H

J

WNXGGG

Α Β Ρ

I

10SW01

Β

I

DIPHOSGENE 2 XYL BROMIDE

I

CHL0R0PICRDÎ

I

ME2 SULFATE

These successive measures follow a trend from high to low concentrations; this sequence provides more space i n the notation field for complex compounds which A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N TOXICITY D A T A

73

could not exist i n the high concentrations that are given i n columns 15 and 17. Narcotic concentrations apply only to relatively simple compounds that have very short line-formula notations; thus columns 12, 13, and 14 can be reserved to show narcotic concentrations for rats, mice, or man, respectively. Column 20 can show the inhalation LDso for rats after four hours of exposure; and column 22, the tolerable LD for rats after one hour. If revisions are necessary i n any of these toxicity column assignments, the cards containing a mark i n the questioned columns can be sorted out quickly and the revised layout can be reproduced automatically. This chemical card layout is the result of several such revisions. The tabulating applications, such as the direct creation of Ditto masters at the rate of 9000 complete lines per hour, are of sufficient potential value to justify special consideration. ( F o r example, the two separate components of the I B M letter punches can be sensed to p r i n t the two-column numeric equivalent of Smyth's toxicity grades.) In some standard tabulators, the "zone" punch for the 0.5 value (denoted here as the # mark) rrçay p r i n t as a zero; while this #-zero could hardly be mistaken for the top 500,000 value, it might be confused w i t h the O-letter (a very slightly narrower figure i n the I B M type) since this mark represents the 0.008 value. To avoid any such tabulating ambiguities, the # mark should be re­ placed by the J punch i n the few places where the 0.5 value is coded; then on the punched card itself, the lower component of this J-punch (position 1) can be circled to show that the true r a t i n g is 0.5 rather than 0.25, if such refinement is justified. Toxicity ratings in weight units apply to the most complex structures i n the registry. Therefore a l l weight ratings are given i n the right-hand part of the card, outside the notation-vapor-name field illustrated in Table V . (Here again the letter punches could be sensed i n two parts to print a two-column numeric grade.) Weight ratings for radioactive substances extend beyond the 0.001 value — f o r example, in m i l l i g r a m units—so a l l tabulating ambiguities among singlemark symbols can be avoided by using S for the 0.0005 value, Τ for 0.00025, etc., to Ζ for the least possible 0.000004 value. Thus, the mark U or value 0.00012, i n milligrams per body units, represents the maximum amount of radium (0.1 micro­ gram) that could safely be deposited i n the human body. The plutonium l i m i t i n these same units is represented by the R mark (1 microgram) ; and i n contrast, the normal 50-gram salt content of the human body is represented by the numeral 3 (50,000 mg.). These single-mark equivalents of Smyth's "range-finding" grades provide suf­ ficient capacity on the chemical-information cards for 16 different toxicity ratings by weight, i n addition to the three fields of information illustrated i n Table V . Physicians' doses may be included among the weight ratings, since these values obviously are conservative measures of safe body tolerances. Columns 55 to 70 are reserved for these above mentioned 16 different ratings. The equally concise structure descriptions may extend from the first few columns (for common solvents) to the 43rd letter-printing column (for exceedingly complex drugs).

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

0

Chemical Structure Notation

The line-formula notation featured i n this punched-card catalog also has been used independently by Benson (1) to code 3500 E a s t m a n chemicals on RemingtonRand cards, and by Smith (12) to code over 7100 Handbook chemicals on I B M cards at the U n i v e r s i t y of H a w a i i . Smith employed the systematic contractions that are explained i n the f u l l y detailed manual for this notation (15), and demon­ strated that 8 0 % of the Handbook structures (7, 9) could be described w i t h ten or less punched-card columns. Smith and Benson also made statistical studies which showed, for example, how well the first two marks of this line-formula no­ tation divide the 7100 Handbook chemicals among several hundred functionally distinct subclasses. Table V I illustrates the specific letter symbols which, w i t h the generic a l k y l group symbol, A , describe common aliphatic structure types. Specific a l k y l groups are distinguished by arabic numerals which denote the number of carbon atoms i n these chain units. Unsaturations are indicated by colon marks i n typewritten or

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

74

A D V A N C E S IN

CHEMISTRY SERIES

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

printed copy, or by the letter U i n punched-card equipment. Thus, ethylene is 1:1, acetylene is 1::1, and butadiene i s 1:2:1. Only two new specific letter symbols are necessary to describe the open-chain hydrocarbons: Y for the ternary carbon atom, and X for the quaternary carbon atom. These letters graphically suggest the corresponding bond patterns. Only two additional new letter symbols are necessary to describe the thousands of aliphatic oxygen derivatives of these hydrocarbons: Q for the h y d r o x y l or O H group, and V for the very common carbonyl connective, the bivalent CO group. Three new letter symbols are introduced w i t h the nitrogen derivatives: Κ for the quaternary and cationic nitrogen atom, M for the imino or mid-amino N H group, and Ζ for the terminal or p r i m a r y amino N H 2 group (as i n h y d r a z i n e ) . Table VI.

Descriptive Symbols for Common Aliphatic Structure Types [Letter A

denotes a n y a l k y l ( e n e ) A.

ΑΜΑ A N : ΝΑ AOA AOOA AOVOA ASA ASSA A:A A:A:A A::A AVA AVOA AVOVA CNA FA GA M:A NCA NCOA NCSA

Unbranched

sec- Amines Azo compounds Ethers Peroxides Carbonates Sulfides Disulfides Alkenes Alkadienes Alkynes Ketones Esters Anhydrides (acid) Isocyanides Fluoroalkanes Chloroalkanes Imines Nitriles Cyanates Thiocyanates B.

ANA.A AOPHO.OA AOPOA.OA AOSO.OA AOSWOA AOYA.OA AOYOA.OA ASXA,A.SA ASYA.SA ΑΧΑ,Α.Α AYA.A M : YA.OA M :YGA M :YZMA OAsA,A.A ΟΚΑ,Α.Α QXA,A.A QYA.A QYA.OA QYA.:M QYA.:NQ

chain or

NNNA OCNA 0:A 0:NA PHHA QA QOA QVA SCNA SHA SHVA S:A WNA WNOA ZA ZVA ZVAVQ ZVMMA ZVMVA ZVOA

B r a n c h e d Structures ( C o n t a i n i n g Ternary

tert-Amines Phosphites Orthophosphites Sulfites Sulfates Acetals Orthoformates Mercaptols Mercaptals Neoalkanes Isoalkanes Imido-esters Imide chlorides Guanidines Arsinoxides A m i n e oxides tert- Alcohols sec- Alcohols Hemiacetals Imidic acids Hydroxamic acids

connective]

Structures

Azides Isocyanates Aldehydes Nitroso compounds Phosphines Alcohols Hydroperoxides Carboxylic acids Isothiocyanates Mercaptans Thiolic acids Thioaldehydes Nitroalkanes Nitrates primary Amines Amides A m i c acids Semicarbazides Ureides Urethanes

and Quaternary

S:YA.A S:YA.SH S :YGA S :YQA S :YZMMA S:YZMN:A W N Y A . :NQ WSA.A WSA.OA WSGA WSQA ZMYA. :M ZNA.A ZN : YA.A ZN:YA.MZ ZVNA.A ZYA.:M ZYA.:NQ ZYA.:NZ ZYA.:S ZYVQ/A

Atoms)

Thioketones Dithioic acids Thioacyl chlorides Thionic acids Thiosemicarbazides Thiosemicarbazones N i t r o l i c acids Sulfones Sulfonates Sulfonyl chlorides Sulfonic acids Imidrazides as-Hydrazines Keto-hydrazones Hydrazide-hydrazones as-Ureas Amidines Amidoximes Amidrazones Thionamides L-aZp/ia-Amino acids

Sulfonyl and nitro groups introduce the symbol W for the nonlinear " d i o x o " or 0 p a r t of these functional groups. F i n a l l y , the two most common halogen atoms are denoted by distinctive single letters which facilitate machine operations (see " B i n a r y Searching F i e l d " ) : G for the chlorine atom, avoiding the typewriter ambiguity i n CI, and Ε for the bromine atom. Thus, hydrogen and the halogens f a l l w i t h i n an alphabetically compact E , F , G, Η, I sequence—to which J can be added for a "jeneric halo j e n " ! 2

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N TOXICITY D A T A

75

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

The remarkably efficient discriminating power of these line-formula symbols is demonstrated by Smith's and Benson's statistical analyses: the first two marks divide the 7100 Handbook chemicals into 263 different " f u n c t i o n a l " groups, and 147 of these contain less than 0.1% of the catalog (less than 7 chemicals each). Benson's corresponding analysis of the 3500 E a s t m a n chemicals shows surprisingly close agreement; thus, the percentages i n the 24 largest groups compare as follows (Smith vs. Benson l i s t s ) : 10— 1R— IV— 1Y— 20— Gl— GR— GV— NC— OU— Ql— Q2—

Methoxy compds. 1.2% vs. 1.5% T o l y l compds. 1.2 1.2 A c e t y l compds. 1.6 2.3 Isopropyl compds. 2.3 1.7 Ethoxy compds. 2.8 2.8 Cl-methyl compds. 0.8 1.2 Chlorophenols 0.8 1.2 Cl-formyl compds. 0.7 1.5 Nitriles 1.3 1.8 Aldehydes 1.9 2.7 HO-methyl compds .1.4 1.2 H O ethyl compds. 1.4 1.2

QR— QV— QY— WN— w s —

ZR— ZV— ZY— L— L6— T5— T6—

Phenols 4.3% vs. 3.8% Carbox. acids 8.2 5.8 sec-Alcohols 1.8 1.2 N i t r o compds. 6.6 6.0 Sulfo compds. 1.3 1.4 5.0 4.0 A n i l i n e derivs. Amides, ureas, etc. 2.2 1.4 "sec-Amines," etc. 1.5 1.2 (Carbopolycyclics) 2.1 1.5 (Cyclohexyl, etc.) 6.5 6.5 (Heterocyclics) 5.8 5.2 (Heterocyclics) 6.0 5.2

E a c h of the remaining 239 two-letter subdivisions of the 7100 Handbook chem­ icals contains less than 1% of the catalog (less than 70 chemicals each). Cyclic compounds other than simple benzene or polyphenyl derivatives con­ stitute a surprisingly small fraction of the commonly met chemicals—only 2 3 % of the 7100 Handbook chemicals, and only 2 6 % of the 3500 E a s t m a n chemicals. Thus the open-chain and benzene-ring derivatives must be recognized as the dominating types among the commonly met compounds. The chapter headings i n general or advanced textbooks of organic chemistry reflect this same proportion. E v e n i n the National Research Council's catalog of 50,000 biologically tested com­ pounds, tetracyclic and higher r i n g systems constitute only about 2 % of the total—and 1.4% of these are sterol derivatives! The benzene r i n g is cited f a r more often than a l l other rings combined, so this singular prominence (even i n the Beilstein Handbook) justifies the use of a single letter R for this resonating r i n g . A l l other r i n g systems are described w i t h the well-known R i n g Numbers; i n the cycloalkyl derivatives, these are log­ ically associated with the alkyl chain symbols, and i n polycyclic systems, the r i n g numbers are cited i n pictorially direct order. These numeric descriptions of r i n g systems a l l are distinctively enclosed i n parentheses. Lower case letters locate a l l r i n g positions through their alphabetic order; these " l o c a n t s " constitute a logically distinct and very concise set of symbols. They establish the relative positional relations i n a w a y that never can be confused with existing systems of "enumeration." In M a y 1950, at the National Research Council's F i r s t Symposium of the Chemical-Biological Coordination Center (10), the author privately discussed a way of using this line-formula notation i n standard punched-card equipment. Briefly, the key idea centers on the use of a blank space w i t h i n the punched-card notation—to convey "lower case" meaning i f i t precedes a letter (ideal w i t h the locants), and " s u p e r s c r i p t " meaning i f i t prefixes a number (ideal for multipliers of radicals, as well as the necessary but infrequent isotope numbers). The Notation manual should be consulted for more elaborate details relating to structure de­ scriptions. The Binary Searching Field

There are 1369 possible combinations of the first two line-formula symbols (26 letters, 10 numerals, and the blank space). A few hundred of these would not appear i f the notation is used according to the rules—e.g., " f o r b i d d e n " com­ binations like 1 Q — and Y l — when compounds of a l l elements are considered. Thus, the first two marks alone of the line-formula notation distinguish roughly a thousand subclasses i n very large chemical catalogs. When combinations of functional groups are sought, however, even this enormous discriminating power is not enough. W h a t is needed w i t h standard punched-card equipment is a compact binary—i.e., present or absent—searching field: one that provides a fixed punching A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

76

A D V A N C E S IN CHEMISTRY SERIES

place for each elementary symbol i n the structure description. A " b i n a r y " field shows present or absent, punched or unpunched, distinctions, as w i t h edge-notched cards. A four-column I B M field is sufficient when the elementary variables are the symbols of this notation, because there are 48 binary positions i n these four columns, and only 36 single-mark symbols—like 2 for ethyl groups, 4 for butyl or cyclobutyl groups, Q for alcohols, V for ketones, and Ζ for p r i m a r y amino or N H groups. Thus, 12 additional positions remain for independent variables such as ionic nature, aromatic character, heterocyclic group, etc., and for a few very prominent twoletter groups such as carboxylic acids ( Q V ) and esters ( O V ) . The latter two assignments prevent overloading of the alcohol ( Q ) , ether (O), and ketone ( V ) punches. Table V I I summarizes the assigned meanings for the 48 positions i n this binary searching field. The numerals and capital letters are re-registered directly from the corresponding atomic group symbols i n the I B M notation—e.g., both M and G for a M g atom—except that the benzene " R " is punched i n the upper (R) position. A l l positional designations are disregarded. The other parenthetically bracketed punching positions identified i n Table V I I are the additional semiindependent variables. The discriminating capacity of this four-column binary field is enormous : i f only 40 of the 48 structural distinctions were fully independent variables, these four columns could show 2 or 1,000,000,000,000 combinations! Columns 50 to 53 on the chemical identification cards are reserved for this binary searching field, as shown i n Table V I I . A n additional column 54 is reserved to show multiple occurrences of any one symbol-punch i n the corresponding hori­ zontal row of the searching field. Thus the cards for polymethyl or polyphenyl compounds would have the 1-position punched again i n column 54; polybromo or polyketo compounds, the 5-position; and polyhydroxy compounds, the 8-position punched again i n column 54. (The vertical positions for the letter-punches are obtained directly from the lower components of the standard I B M letter-punches, and the three horizontal positions—columns 51, 52, or 53—are determined from the upper or "zone" components. Thus, a l l of the 36 notational symbols can be "regenerated" from the binary searching field f o r proofreading work, i n a 36-column specially wired tabulation.)

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

2

40

Table VII. 50

51

52

53

(t) (a) (s) (x) (d) (e) (ο) (h) (i) (q) (c) 1 A J (R) 2 Β Κ S 3 C L Τ 4 D M u 5 Ε Ν V 6 F 0 w 7 G Ρ X 8 H Q Y z R 9 I

Binary Searching Punches

Ad Additional

(t) (d) (a) (e) (i) (s) (o) (q) (x) (h) (c) (R)

Punches (Bracketed Symbols) Twenty or more C-atoms i n a l k y l group Decyl to nonadecyl alkyl chains General alkyl group Element or alloy Inorganic compound Salt or ion-pair compound OV-ester (not punched i n Ο or V ) QV-acid (not punched i n Q or V ) Aromatic character i n any r i n g besides R Heterocyclic r i n g Carbocyclic r i n g other than benzene-R Benzene r i n g

The lower R-punch (Col. 52-9) is reserved for R i n the atomic symbol, as i n C R . Formula Index Numbers

The results of the author's questionnaire shown i n Tables III and I V demonstrate that the structural measures are still favored as searching tools for chemical catalogs and include those used i n the Chemical Abstracts' F o r m u l a Indexes and R i n g Indexes. Punched cards are ideal carriers for formula index numbers, be­ cause each atom count can serve as an independent searching and sorting aid, or classifying and filing measure. Subdivisions based on these atomic totals represent truly elementary distinctions that set broad, n a t u r a l limits to the possible combina­ tions of functional groups. F o r example, a search for halogenated nitro compounds must be confined to those sections of the F o r m u l a Index that contain at least one

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N TOXICITY

DATA

77

halogen atom, one nitrogen atom, and two oxygen atoms; the minimum total of at least four heteratoms also is a search-limiting measure. Most structural searches can be focused to an astonishing degree by storing the cards on the shelf according to j u s t two or three n a t u r a l formula-indexing measures of small magnitude (less than ten) that are easily learned, quickly counted, and always remembered. A practical consequence of no small importance is that this elementary storage accelerates searches and retards card wear, because sections of the catalog that cannot contain the minimum elements obviously need not be passed through the searching machines. Optimum sorting efficiency w i t h I B M equipment is obtained when the actual formula numbers are reduced to a simpler set of numerical measures, each w i t h probable values ranging from zero to nine. One such measure might be the number of oxygen atoms, up to nine and more i n the final division. This measure gives a rather poor distribution among the high values, however; compounds w i t h one to four oxygen atoms i n the structure are f a r more probable than those w i t h five or more oxygen atoms. If several equally simple atom-counting measures were found to give reasonably balanced population distribution among the ten-digit values of each measure, their ease of usage would clearly outweigh the slight loss in capacity due to imperfect distributions. F o u r such formula file numbers have been found and tested i n large indexes. The first number, designated as an Α-digit, divides the Chemical Abstracts F o r m u l a Index into ten equally large or equally important elementary sections: Α-Digit 0 1 2 3 4 5 6 7 8 9

Elementary Definition Oxy-hydrocarbons ( C , H , 0 only) N ( C , H , 0 ) compounds N ( C , H , 0 ) compounds Ng_ ( C , H , 0 ) compounds S ( C , H , 0 ) compounds S , N ( C , H , 0 ) compounds H a l o g e n ( C , H , 0 ) compounds H a l o g e n . N ( C , H , 0 ) compounds All other o r g a n i c compounds A l l i n o r g a n i c compounds

Per

Cent of Catalog 15.2 9.2 8.6 9.8 7.8 17.3 7.9 11.6 8.7 3.9

t

2

N

100.0

Total

The percentage figures indicate the distribution for the ten-year period from 1942 to 1951. The recent increases among P, S i , or F compounds might seem large w i t h i n the 8.7% total of the A division, but these little expansions have not dis­ turbed the long-enduring dominance of the C , H , O, N , S compounds. F o r m u l a indexes from smaller general collections show a much smaller percentage i n the A division, and a much larger percentage i n the first or A division. Thus, the smaller catalogs reflect the dominance of the simpler combinations of elements. A logical subdividing measure for these major elementary Α-digit divisions is the total heteratomic count, designated as the T-digit and meaning a l l atoms other than carbon or hydrogen. It gives a somewhat better distribution than the O-atom count (with the same maximum of nine and more), and provides better correlations. Thus the T i division associates a l l simple alcohols, ethers, amines, and halides. A third natural decimal measure that gives s t r i k i n g distributional uniformity is designated as the C-digit because i t represents simply the units p a r t of the longfeatured carbon-atom count; that is, the digit 2 for C , C i , C , etc., or C i . The data i n Table V I I I show how well some 994 randomly selected examples (the first compound on every even-numbered page of the Chemical Abstracts' F o r m u l a Indexes) are distributed among the 90 organic subdivisions defined by the Α-digit and C-digit values. The effectiveness of these two simple decimal measures is proved by the large number of average-sized subdivisions. Only four of the ninety subdivisions contain more than twice the average (more than 20 compounds), and only five contain less than half the average (less than 5 ) . Confirming proof of a profound statistical balance is evident i n the last column of Table V I I I , where the percentage figures show how well the C-digit itself divides the collection into ten almost exactly equal sections. 8

5

0

2

2

22

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

0 2

78

A D V A N C E S IN Table VIII.

Distribution of Compounds among A-C

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

(994 r a n d o m o r g a n i c examples

C-Digit Values 0 1 2 3 4 5 6 7 8 9

Α-Digit

-Subtotals

Values

0

1

2

3

20 16 12 20 14 13 10 19 17 15

9 8 6 8 12 8 12 6 14 13

11 12 8 10 9 11 9 7 5 8

12 8 18 9 7 11 7 12 9 8

156

96

90

101

from

(Major

C-Value,

6

14

11

5 9 6 9

17 19 18 10 18 9 23 16 24 22

13 6 11 9 11 6 8 6

7 8 18 14 13 8 13 13 14 9 12

81

176

82

122

U

pages)

Divisions)

5

10 8 14

Subdivisions

1942-51 C.A.

U 2

CHEMISTRY SERIES

1

3

Subtotals 111 91 109 87 104 84 111 93 108 96

90

994

a

9 7 10 3 11 6 21 U

16

%

11 9 11 9 10 8 11 9 11 10

99

A fourth numerical measure that is somewhat more difficult to understand is designated as the Η-digit because it is integrated from the tens and units digits of the Η-atom count. The units digit alone gives poor discriminating efficiency because odd numbers of Η-atoms cannot occur among the compounds i n the A , A , and A elementary divisions; and even numbers cannot occur i n the A i division. Therefore, the Η-digit is derived from the true Η-atom count as shown here: 0

2

4

Η-Digit 0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26 27

28 29 30 31 32 33 34 35 36

True 37 38 39 40 41 42 43 44 45

Η-Atom 46 55 47 56 48 57 49 58 59 50 51 60 52 61 53 62 54 63

Count 64 73 65 74 66 75 67 76 68 77 69 78 79 70 71 SO 72 81

82 83 84 85 86 87 88 89 90

91 92 93 94 95 96 97 98 99

100 101 102 103 104 105 106 107 108 109

The effectiveness of the four A , T, C, and H digits i n independently decimating a comprehensive F o r m u l a Index has been demonstrated with the 6500 organic com­ pounds i n the Lange Handbook (9), and w i t h the 5500 in the Hodgman Handbook (7). The 25 largest groups of compounds w i t h the same empirical formula i n either of these handbooks contain 10 to 29 members. These groups increased by an average of only two additional compounds when classified by the much simpler A - T - C - H digits. Furthermore, no new A - T - C - H divisions appear from other sets of formula isomers that are larger than these. This s t r i k i n g leveling effect of the four n a t u r a l numerical measures is illustrated i n Table I X . Inorganic compounds contain no carbon atoms, so for these compounds, the C-digit indicates the largest periodic group number of the metallic elements i n the given formula, up to the value 7 for the M n group, 8 for the F e group, and 9 for the Co group. S i m i l a r l y , for these compounds, a redefined Η-digit indicates the largest "contravalent" periodic group number, starting with 0 for the inert gases, 1 for the halogens, etc., and concluding w i t h 8 for the N i group. Thus the A - T - C - H number for F e C l is 9-3-8-1, and that for K C r 0 is 9-9-6-2 (or j u s t 9381 and 9962). R i n g Indexes also are important because rings characterize the structure and contribute peculiar chemical attributes like aromatic character. Fortunately, a single ring-indexing digit has been found that seems sufficient i n itself to comple­ ment the four formula digits. This fifth measure is designated as the Ring-digit of B-digit (full sequence of digits gives a B - A - T - C - H number, a useful memory aid) because its divisions show an exact parallel w i t h those defined by the A - d i g i t — a parallel between numbers of benzene rings and numbers of nitrogen atoms, between other monocyclic compounds and sulfur compounds, between bicyclic compounds and halogen compounds, between tricyclic structures and a l l other organic formulas, and finally between the remaining polycyclic structures and inorganic formulas. This parallel is best understood by comparing the respective A - and B - definitions. 2

2

2

7

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N TOXICITY D A T A Table IX.

79

Analysis of Largest Formula Index Divisions

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

( A l l A - T - C - H divisions w i t h more t h a n 20 isomers i n either the H o d g m a n or L a n g e H a n d b o o k lists)

No. in A-T-C-H No. in Largest A-T-C-H Largest Division Formula Division File Formula Numbers Division Hodgman Lange Hodgman Lange 27 31 0291 25 29 C9H10O2 24 28 1182 23 26 CsHiiN 22 28 0203 20 26 C10H12O2 27 27 0388 27 27 CeHgOs 18 26 0287 15 22 CsHieC^ 26 7 0075 24 5 C7H14 25 23 0005 21 20 C10H14 20 24 0165 20 23 CelïléO 19 23 6264 18 22 CelÏ4:X2 23 13 0063 19 9 C e l l 12 22 22 0263 21 21 C6H1202 22 21 0281 21 20 CsHlo02 10 22 6189 10 22 C HoX 19 21 18 19 0177 ΟγΙϊιβΟ 19 21 0251 13 15 C5H10O2 5 21 6446 3 13 C14IÏ6O2CI2 Sizes of these formula divisions increase i n proportion to the square root of the list size; thus the largest formula divisions in the 500,000-item C.A. Cumulative F o r m u l a Index contain around 200 isomers each. The impressive labor-saving power of the decimating numeric measures can be illustrated w i t h a single example—a deck of 10,000 cards can be reduced to 1000 by sorting on the first specified digit; these 1000 cards in t u r n can be reduced to 100 by the second digit, and the 100 to 10 by the t h i r d digit. The total number of card-passes through the sorter is only 11,100—a mere twenty minutes of ma­ chine time with a standard model that processes 650 cards per minute! If cards were made for each of the 500,000 chemicals in the Chemical Abstracts Cumulative F o r m u l a Index, and i f these were stored by rows and columns i n ac­ cordance w i t h the C-digit and Α-digit values, the largest resulting A - C partition would contain some 12,000 cards (see Table V I I I ) . Thus in a formula-index search, this largest segment would be reduced to a mere dozen or so by three simple sorting operations on the remaining B - T - H decimating numbers. Comprehensive searching and correlating versatility could be made available i n a ring-formula catalog, stored by the B - (ring) and Α-digits, for the resulting B - A partitions follow an idealized Beilstein arrangement. In this ring-digit a r r a y of cards, an eleventh " t o p " B-row would appear for a l l chemicals w i t h an undefined or blank value i n the B-digit position. (In this same row, the blank A - d i g i t position provides a specific place for drugs of undetermined empirical formula.) Columns 71 to 75 in the "chemical identification" cards carry these B - A - T - C - H numbers that provide a "streamlined" ring-formula index; columns 76 to 78 i n this field are reserved for supplementary serial designations—arbitrary number or letter assignments—that provide concise yet fully specific identifications for all catalog entries, regardless of name or notation. (If letters and numbers are used in columns 76 to 78 for the above-mentioned drugs of undetermined empirical formula, more than 39,000 designations are provided, without any Ο or I ambiguities.) These ten equally important cyclic divisions therefore are as follows: 8

Ring-Digit 0 1 2

(B)

Definition Open-chain structures Moonnoopphheennyyll derivatives M derivatives B i p h eennyyll or bis-phenyl derivatives derivatives P o l y p h e n y l a n d benzoquinone derivatives O t h e r monocyclic structures M o n o c y c l i c - ( p o l y ) p h e n y l derivatives B i - or bis-cyclic structures (nonphenyl) B i - or bis-cyclic structures w i t h ( p o l y ) p h e n y l branches T r i - or t r i s - c y c l i c structures, w i t h or w i t h o u t p h e n y l - r i n g branches A l l other m u l t i - or poly-cyclic structures (and p h e n y l branches)

% of 21.5 20.8 8.1 1.3 11.1 9.4 10.7 5.4 7.7 3.9

Catalog (14) (23) (11) ( 4) ( 8) (10) (12) ( 8) (7) ( 3)

The B e and B divisions include bis-monocyclic structures like nicotine, as well as the bicyclic structures like naphthalene. Likewise, " t r i s - c y c l i c " structures 7

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

A D V A N C E S IN CHEMISTRY

80

SERIES

include tris-monocyclic and bicyclic-monocyclic combinations as well as the rarer tricyclic systems. Bis-bicyclic combinations thus belong i n the last and smallest division. If the r i n g divisions were not defined i n this "integrated" manner, the large ones would become still larger, and the small ones still smaller. The first column of percentage figures represents the distribution found f o r the 50,000 biologically tested compounds that are cataloged i n the National Research Council's Chemical-Biological Coordination Center (10). The figures i n paren­ theses represent the corresponding distribution i n the larger but less representative Beilstein Handbook (many of these are merely identification derivatives). Only 0.6% of the N R C - C B C C compounds are tetracyclic and larger systems other than sterol derivatives; some 1.4% are steroids, and the remaining 1.9% i n the B division consists of bis-bicyclic, etc., combinations. Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

9

Short Name Identifications

The last 13 columns of the 43-column Notation field on the "chemical identi­ fication" cards are reserved for a short name identification when one is established. Thus, i f a very complex structure has no short name and requires 30 to 43 columns for its notation, no card capacity is lost; and i n the few cases where such a very complex structure has a short name like hemoglobin, the complete structure de­ scription is given on a second card (see discussion i n " T o x i c i t y L i t e r a t u r e " ) . Short name identifications, though seldom systematic, have remarkable survival strength; and this endurance is a proof of their continuing usefulness. F o r example, more than a century ago, many index entries for organic treatises were names like acetal, acetone, acetic acid, aconitic acid, acroleine, adipic acid,, alcohol, alizarine, allantoine, and alloxan. These convenient word identifications still are the main entries for these compounds i n the latest Merck Index (with no spelling change other than possible loss of the terminal e). Inorganic crystallographic types likewise still are identified by chemically unrevealing names such as the diamond, zinc blende, zincite, fluorite, diaspore, cuprite, pyrite, ilmenite, calcite, and perewskite forms. These old-fashioned mineralogical names are useful p a r t i c u l a r l y i n cases where the formula alone is insufficient to identify polymorphous solids. Forensic medicine undoubtedly continues to perpetuate legally established word identifications such as atropine, cocaine, curarine, hemoglobin, heroin, morphine, reserpine, and strychnine. " S y s t e m a t i c " names for these structures certainly can be devised, but they would be so complex and forbidding i n appearance that their usefulness would be extremely difficult to demonstrate. I n contrast, these es­ tablished names are v i r t u a l l y indispensable dictionary identifications. Pesticide name identifications like the recently assigned aldrin, allethrin, chlordan, dieldrin, heptachlor, lindane, malathion, methoxychlor, parathion, schradan, toxaphene, and w a r f a r i n have a far-reaching practical value that hardly needs elaboration. A l l of the above names are sufficiently concise to be printed i n the provided 13-column name field. Thousands of longer names also can be contracted to this size through the omission of terminating letters, or through the use of very simple combinations such as the international atomic symbol plus its valence number, when this identifies the first part of a name. M a n y names for salts and esters consist of just two p a r t s ; punched-card operations on both parts obviously can be made i f the first p a r t is confined to the first three columns, and the second part to the last nine columns of a 13-column name field. A numeral i n the t h i r d column can represent either the valence of a symbolized metallic ion, or the multiplicity of a monovalent radical. Thus the contraction AG BU2 ET FE2 FE3 ME3

CHLORIDE OXALATE LINOLEATE SULFATE CHLORIDE PHOSPHATE

means means means means means means

silver chloride dibutyl oxalate ethyl linoleate ferrous sulfate f e r r i c chloride t r i m e t h y l phosphate

Table X contains some of the hundreds of anionic or terminating names—like - A N I L I N E , - B E N Z E N E , - P Y R I D I N E , or - T H I O P H E N E — t h a t can be suitably identified w i t h i n the last nine columns (35 to 43) of the name field. The iso prefix

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

WISWESSER—LITERATURE SOURCES OF M A M M A L I A N TOXICITY D A T A

81

is contracted to the letter J i n both parts of the name, as i n I B U I C Y A N A T E and I P R I V A L E R A T E . Likewise, the chloroacetate names are contracted to C L 1 A C E T A T , C L 2 A C E T A T , and C L 3 A C E T A T . The name for the AsS* anion s i m i l a r l y is contracted to S 4 A R S E N A T , and analogous compound names are contracted i n the same manner.

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

Table X. Acetate Aconitate Acrylate Adipate Alum Aluminate Amide Anisate Anthranlt Arsenate Arsenide Arsenite Azide Benzoate Bicarbnat Bifluorid Bismate Bisulfate Bisulfide Bisulfite Borate Boride Bromate Bromide Butyrate Carbamate Carbide Carbonate Caseinate Cerate Chlorate Chloride Chlorite Chromate Chromite

Cinnamate Citrate Cyanamide Cyanate Cyanide Decoate Dichromat Dioxide Disulfide Ethoxide Ferrate Ferrite Fluoborat Fluocrmat Fluosilat Fluotitat Fluoride Formate Fulminate Fumarate Furoate Gallate G e r m a n ate Gluconate Glutamate Glutarate Glycphsft H4Borate Hafniate Heptoate Hexoate Hippurate Hydride Hydroxide Hypoclrit

N u m e r a l is a m u l t i p l i e r of

Short Anionic or Terminating Names Hyponitit Hypopsfat Hypopsfit Hyposlfit Ibutyrate Icyanate Icyanide Iodate Iodide Itaconate Ithiocynt Ivalerate Lactate Laurate Levulinat Linoleate Linoresnt Maleate Malonate Mandelate Manganate Manganite Mercaptan Methacrlt Molybdate Myristate Naphthent Naphthoat Nicotinat Nitrate Nitride Nitrite Nonoate 0 acetate Octoate

preceding symbol : H 4 B o r a t e

Oleate Osmate Oxalate Oxanilate Oxide Palmitate Perborate Perbromat Perclorat Periodate Permangnt Peroxide Perphsfat Persulfat P e r t h ion t Phosphate Phosphide Phosphite Phthalate Picramate Picrate Platinate Platinite Plumbate Plumbide P r o p ion at Resinate Rhenate Ricinate Salicylat Sebacate Selenate Selenide Selenite Silicate

Soyate Stannate Stannide Stannite Stéarate Stibnate Stibnite Succinate Sulfanlat Sulfate Sulfide Sulfite Tannate Tantalate Tartrate Telluride Tellurite Thiocyant S4Arsenat S4Bismate S3Stannat S4Stibnat Thoriate Titanate Toluate Tungstate Uranate Valerate Vanadate Xanthate Zincate Zirconate

is tetrahy droborate.

Initiating symbols M E , E T , P R , B U , A M or ΡΕ, H X , H P , and OC should be obvious contractions for the π-alkyl names. L i k e A C , B Z , and P H , these twoletter symbols can be combined w i t h a numeric multiplier i n column 34; but the multiplier must be omitted for three-letter contractions like A L Y ( a l l y l ) , A M I (amino), A M M (ammonium), B Z L (benzyl), D E C (decyl), I C Y (isocyano), I P R (isopropyl), L R L (lauryl), N O N (nonyl), T O L (tolyl), and the like. Prefixed letters or numbers precede the name field; thus i n column 29 the lone letter M is meta, 0 is ortho, Ρ is para, R is racemic or dl, S is secondary (or sym­ metric), Τ is tertiary, and V is vicinal. A contracted name like 2356BR4 Phenol would begin i n column 27, thus should be understood to mean 2,3,5,6-tetrabromophenol. Dyes, indicators, and pigments usually earn names that have an obvious prac­ tical meaning, such as A n i l i n e Black, B i s m a r k B r o w n , Indigo Blue, Guinea Green, Butter Yellow, M e t h y l Orange, L i t h o l Red, Methyl Violet, and the recently popular Methyl Purple. Where the name ends i n the color identification, as i n these cases, the color correlations can be made w i t h punched cards by using the last three columns (41 to 43) for code words such as R E D , O R G , Y E L , G R N , B L U , V L T , P R P , B R N , B L K , and W H T . These are used only as parts of the name, not as physical descriptions. The name contractions suggested i n this section do not add any system to the seemingly chaotic collection of established word identifications—nor is any systemization intended. Instead, the naming procedures should be kept as liberal and flexible as possible, so that eventual improvements are not excluded. A t the present state of development, the short name should supplement or complement A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.

82

ADVANCES

IN CHEMISTRY SERIES

the systematic notation and the formula numbers i n the same way that a "bird'seye v i e w " supplements an architectural floor p l a n and the side views. If a research preparation has no common name and no commercial value, but can be described precisely by the systematic notation, classified by this notation and the prefix mark, cross-indexed by the formula-ring numbers, etc., then the " n e e d " for a name that is nothing more than a spoken constitutional formula seems to be more imaginary than real i n this catalog. I f the notation is short, and the corresponding systematic name can be contracted to fit w i t h i n the 13column name field (not counting prefix numbers), this is a welcome confirmation, but not a necessity.

Downloaded by CALIFORNIA INST OF TECHNOLOGY on January 20, 2018 | http://pubs.acs.org Publication Date: January 1, 1956 | doi: 10.1021/ba-1956-0016.ch011

Acknowledgment The author is very grateful for the assistance generously given by H . T. Bonnett i n the preparation of this manuscript, and for the statistical data on notational divisions furnished by F . R. Benson and E . G. Smith. Literature Cited

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)

Benson, F . R., Metalectro Corp., L a u r e l , M d . , private communication. Chem. Eng. News, 31, 2719 (1953). Ibid., 32, 866 (1954). Craver, Β. N . , and others, Arch. Ind. Hyg. Occupational Med., 2, 280 (1950). Deichmann, W . B., and Mergard, E . G., Ibid., 30, 373 (1948). D r i n k e r , P., and Cook, W . Α., Ibid., 31, 51 (1949). Hodgman, C. D., "Handbook of Chemistry and Physics," 35th ed., p. 1230, Chemical Rubber Publishing Co., Cleveland, Ohio, 1953. Industrial Hygiene Foundation, Pittsburgh, P a . , "Publications Source L i s t , " 1953. Lange, Ν. Α., "Handbook of Chemistry," 5th ed., p. 287, Handbook Publishers, Inc., Sandusky, Ohio, 1944. National Research Council, Washington, " T h e Chemical-Biological Coordina­ tion Center," 1954. Smith, Α., and Freyder, M . , Quart. Cum. Index Medicus, 46, 69 (1949). Smith, E. G., " A Punched C a r d Catalog of the Physical Properties of Some Common Organic Compounds" ( A F a c u l t y Report), University of H a w a i i , 1954. Smyth, H . F., Jr., Am. Ind. Hyg. Assoc. Quart., 15, 203 (1954). Smyth, H . F., Jr., Carpenter, C. P., and Pozzani, U . C., J. Ind. Hyg. Toxicol., 31, 349 (1949). Wiswesser, W . J., " A L i n e - F o r m u l a Chemical Notation," Thomas Y . Crowell Co., N e w Y o r k , 1954.

RECEIVED

November

18, 1954.

A Key to PHARMACEUTICAL AND MEDICINAL CHEMISTRY LITERATURE Advances in Chemistry; American Chemical Society: Washington, DC, 1956.