EPA Chemical Information System - ACS Symposium Series

Dec 14, 1978 - Over the past seven years, NIH and EPA have developed a computer based Chemical Information System (CIS), which is an online interactiv...
2 downloads 8 Views 1MB Size
10 The N I H / E P A Chemical Information System STEPHEN R. H E L L E R Environmental Protection Agency, PM-218, Washington, DC 20460

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

G. W. A. MILNE National Institutes of Health, Bethesda, MD 20014

Over the past seven years, NIH and EPA have developed a computer based Chemical Information System (CIS), which is an online interactive computer system that handles chemical and toxicological data (1). The CIS consists mainly of a collection of numeric (as opposed to bibliographic) data bases and software to search these data bases. The four main areas of the CIS can be grouped as follows: 1. Searchable numeric data bases 2. Structure and Nomenclature Search system (SANSS) 3. Chemical Substance Information System (CSIS) 4. Analysis and Modelling Programs The first three areas w i l l be described, with emphasis on the linking of areas 1 and 2. Figure 1 shows how the four areas of the CIS are coordinated, with the Structure and Nomenclature Search System (SANSS) in the center. At present there are 25 data bases in the SANSS. These comprise the CIS Unified Data Base (UDB) and are searchable by the SANSS (2). They are shown in Figure 2. The referral aspects of the CIS represent a valuable tool for scientific and administrative work both within our respective Agencies as well as outside these Agencies, in the public and private sector, here in the USA and abroad. The referral capability of the CIS consists of a list of data bases, literature references (e.g., Merck Index) and Government Regulatory files, which can all be accessed simultaneously by consulting a single central file. A l l the available information concerning a substance can be located in a single operation. As the number of data bases is increased, the CIS becomes more valuable and a time-saving device in searches for chemical information. Typical questions that can be readily and inexpensively answered by this approach are: * Has this chemical been sold as a pesticide in the USA? * Is there a measured acute toxicity value for a particular air pollutant? * Is information concerning a drug taken in overdose quantities and identified by gas chromatography-mass spectrometry in the Merck Index or the NIMH book on psychotropic drugs? * Has a certain chemical been registered for sale in the USA? This chapter not subject to U.S. copyright. Published 1978 American Chemical Society Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978. WATER DROP

AMES TEST

/ X-RAY \. ' SINGLE \ CRYSTAL \

/AQUATIC**.. *' TOXICITY*

/PARTITION'.. /COEFFICIENT*'

CTCP

/

OPERATIONAL

Ο

LINKED TOGETHER VIA A NETWORK & CAS REGN

DIRECT BY LINKED ON THE SAME COMPUTER

UNDER TEST



UNDER DEVELOPMENT

LEGEND:

CONGEN

The structure of the CIS with the CAS Registry number linking (CIS components—August, 1978)

/ TSCA \ / PLANT & \ /PRODUCT 10N\

/THERMO-\ •'DYNAMICS*

Figure 1.

SYNCHEM-2

OHM-TADS

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

or

4x

ζ/3

Ο 3

Ο

3.

3

Ο

as

•—ι

M

r

>

M r r

X

RETRIEVAL

O F MEDICINAL CHEMICAL INFORMATION

NUMBER OF COMPOUNDS

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

FILE NIH/EPA-MSSS C-13 N M R

25,560 3,765

E P A - A C T I V E I N G R E D I E N T S IN P E S T I C I D E S PESTICIDES S T A N D A R D S ORD —CHEMICAL PRODUCERS OIL A N D H A Z A R D O U S M A T E R I A L S AEROS/SAROAD AEROS/SOTDAT STORET C H E M I C A L SPILLS T S C A I N V E N T O R Y C A N D I D A T E LIST

1,454 384 375 858 65 572 234 577 33,579

NIMH-PSYCHOTROPIC

DRUGS

S R I - P H S LIST 149 O F C A R C I N O G E N S N B S — S I N G L E C R Y S T A L FILE H E A T S O F F O R M A T I O N O F G A S E O U S IONS G A S - P H A S E P R O T O N AFFINITIES N S F - R A N N P O L L U T A N T FILE FDA-PESTICIDE

REFERENCE STANDARDS

CPSC-CHEMRIC

MONOGRAPHS

CAMBRIDGE

UNIVERSITY

CRYSTAL DATA

1,686 4,448 18,362 3,169 454 225 613 1,000 10,018

EROICA T H E R M O D Y N A M I C D A T A

4,492

M E R C K INDEX

8,894

ITC — I N T E R N A T I O N A L T R A D E C O M M I S S I O N

9,194

N I O S H - R E G I S T R Y OF TOXIC EFFECTS OF CHEMICAL SUBSTANCES N F P A - H A Z A R D O U S CHEMICALS

19,908 397

Figure 2. List of the current 25 collections which currently comprise the CIS unified data base (integrated SANSS data base 3/1/78)

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

10.

HELLER AND MILNE

NIH/EPA

Chemical Information System

147

Among the data bases being added t o the CIS t h i s year are those shown i n Figure 3. Over the next 2-3 years, w i t h the cont i n u e d a d d i t i o n o f f i l e s that are e i t h e r generated or used by the Government, i t i s expected that the l i s t of r e f e r r a l f i l e s w i l l grow to over 250. With the recent e f f o r t s of the four main F e d e r a l r e g u l a t o r y Agencies (EPA, FDA, CPSC, OSHA) to coordinate t h e i r v a r i o u s a c t i v i t i e s , such as the study and r e g u l a t i o n of s p e c i f i c chemicals, t h i s c e n t r a l r e f e r r a l system takes on more importance. This four-Agency group, known as the Interagency Regulatory Liason Group (IRLG) ( 3 ) , i s now working to use the Chemical A b s t r a c t s Service (CAS) R e g i s t r y Number as the standard chemical i d e n t i f i e r f o r the chemicals i n a l l the four Agencies. An i n t e r n a l r e g u l a t i o n has been proposed which w i l l make t h i s mandatory. The r e g u l a t i o n i s modelled a f t e r EPA Order 2800.2, c u r r e n t l y the o n l y Government r e g u l a t i o n to mandate standardized chemical c l a s s i f i c a t i o n ( 4 ) . Over the past four y e a r s , some 170,000 chemical names have been submitted t o CAS, under c o n t r a c t t o EPA, t o o b t a i n the CAS R e g i s t r y Numbers f o r these chemicals. The r e s u l t o f t h i s massive and c o s t l y e f f o r t i s the CIS U n i f i e d Data Base (UDB) of about 101,000 unique chemicals a s s o c i a t e d w i t h the 25 f i l e s shown i n F i g u r e 2. That there i s so much overlap of the chemicals found i n these f i l e s i s not s u r p r i s i n g . I t i s beginning to appear that there are r e l a t i v e l y few chemicals which are a c t u a l l y studied i n any d e t a i l , and even fewer that become s i g n i f i c a n t i n commerce, as, f o r example, drugs, food a d d i t i v e s or p e s t i c i d e s . P r o j e c t i o n s suggest that by the time the CAS r e g i s t r a t i o n process o f some 250 f i l e s i s completed, the a c t u a l s i z e o f the CIS u n i f i e d Data Base w i l l not exceed 175,000-200,000 substances. The need then w i l l be to o b t a i n as much u s e f u l and accurate i n f o r m a t i o n about these substances as i s necessary to p r o t e c t h e a l t h and environment i n the USA, as i s r e q u i r e d by the missions o f our r e s p e c t i v e Agencies. I t i s our hope that by d e f i n i n g the s i z e o r scope of the " r e a l " universe o f chemicals, that the burden on i n d u s t r y w i l l be lessened and that f u t u r e e f f o r t s w i l l be e a s i e r to d i r e c t . Thus, we see l i t t l e immediate need t o study the universe that CAS has d e f i n e d , of over some 4,000,000 chemicals found i n the l i t e r a t u r e that CAS has a b s t r a c t e d s i n c e 1965. Only about 12% of these four m i l l i o n have appeared more than once i n the CASa b s t r a c t e d l i t e r a t u r e and probably no more than 3% are produced and s o l d i n anything but research q u a n t i t i e s . S t r u c t u r e and Nomenclature Search System (SANSS) The S t r u c t u r e and Nomenclature Search System (SANSS), the heart o f the CIS, i s based upon the work of Feldraann who developed the o r i g i n a l search algorithms a number o f years ago (5). A d d i t i o n of a nomenclature search program, an i d e n t i t y search program and a search program based on the Edgewood CIDS s t r u c t u r e keys ( 6 ) , as w e l l as some c o n s i d e r a b l e refinement of the system

American Chemical Society Library 1155 16th St. N. W. Howe et al.; Retrieval of Medicinal Washington, D. C. Chemical 20036 Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

RETRIEVAL OF MEDICINAL C H E M I C A L INFORMATION

U.S. Coastguard Chemical Properties File. E P A I E R L Non-Criteria Pollutant Emissions.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

E P A , Section 111A of the Clean A i r Act. E P A , Office of Air Quality, Permissible Standards, Criteria Pollutants. E P A , Office of Water Supply, File of Drinking Water Pollutants. E P A , Pollutant Strategies Branch, Selected Organic A i r Pollutants. E P A , Effluent Guidelines Consent Decree List E P A , Section 112 of the Clean Air Act. E P A , O R D , Gulf Breeze, List of Chemicals. E P A , Carcinogen Assessment G r o u p List of Chemicals.

E P A , List of Potentially Hazardous Chemicals f r o m Coal and O i l . California O S H A List of Chemical Contaminants. W H O , F o o d and Agriculture Organization, List of Pesticides. E P A , I E R L , Organic Chemicals in Air. NCI, Public List of Carcinogens.

Known

N C T R , Potential Industrial Carcinogens and Mutagens. E P A , I E R L , List of Environmental Carcinogens. E P A , OPP, Pesticide Literature Searches. N I E H S , Laboratory Chemicals. T o x i c and Hazardous Industrial Chemicals Safety Manual. International Technical Information Institute, T o k y o .

E P A , R P A R Candidates Chemical Review Schedule List.

List of Teratogenic Chemicals. Medical Information Center, Karolinska Institute, Stockholm.

E P A , O T S Status Assessments.

E P A , List of Hazardous Pesticides.

E P A , Standing Air Monitoring Work Group List of N o n Criteria Pollutants.

E P A , Mutagenicity Studies. C I T T , List of Candidates.

EPA, O R D - O H E E Chemicals.

E P A , T S C A Section 8e, List of Chemicals.

Figure 3.

Laboratory

Newfilesbeing added to the NIH/EPA CIS UDB in Spring, 1978

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

10.

HELLER AND M I L N E

NIH/EPA

Chemical Information System

149

has been c a r r i e d out over the l a s t few years. The SANSS and i t s data base, connection t a b l e s from CAS and chemical names, has absorbed the bulk o f the CIS budget. C u r r e n t l y , the SANSS can be used i n a number o f ways. The more important methods a r e : * Nomenclature Search (NPROBE) * Ring Search (RPROBE) * Fragment search (FPROBE) * CIDS code search (SPROBE) * Molecular weight search (MW) * Molecular formula search (MF) * Substructure search (SUBSS) * F u l l s t r u c t u r e search (IDENT) In a d d i t i o n t o these searching programs, there are a number o f r e t r i e v a l and d i s p l a y o p t i o n s a v a i l a b l e i n the system. These include: * D i s p l a y of chemical s t r u c t u r e * D i s p l a y of CAS C o l l e c t i v e Index names * D i s p l a y of synonyms, common names and trade names * D i s p l a y of molecular formulas * D i s p l a y of f i l e s c o n t a i n i n g a substance * R e t r i e v a l based upon CAS R e g i s t r y Number The f o l l o w i n g s e c t i o n s w i l l be devoted to e x p l a i n i n g the v a r i o u s SANSS nodules and g i v i n g examples o f how they can be used. At the end o f the chapter an example o f the i n t e r f a c i n g of the SANSS w i t h the NIOSH RTECS data base o f acute t o x i c i t y data (7) w i l l be des c r i b e d , as an example of the d i r e c t i o n that CIS development i s t a k i n g . Since there i s c o n s i d e r a b l e i n t e r e s t on the part o f the chemical i n d u s t r y i n the implementation o f TSCA, access t o the b u l k of the p u b l i c data t h a t EPA w i l l be u s i n g i n i t s work f o r a d m i n i s t e r i n g TSCA should be of v a l u e . At present, development of the SANSS i s being d i r e c t e d towards the immediate needs of EPA s O f f i c e o f Toxic Substance (OTS), so that the foundation t h a t has been b u i l t f o r the SANSS can be used most e f f e c t i v e l y f o r the implementation of TSCA. 1

Name - Nomenclature Search (NPROBE) The name search, NPROBE, has been implemented as a r e s u l t of requests expressed by both the SANSS user community and the CEQTSCA MITRE study proposal (8) f o r the development o f a Chemical S t r u c t u r e and Nomenclature System which we have c a l l e d the S t r u c t u r e and Nomenclature Search System. The software used i s s i m i l a r t o t h a t used i n the ÇHEMLINE system a t the N a t i o n a l L i b r a r y o f Medicine (NLM) and a l l o w s f o r complete or p a r t i a l (fragment) name search. There are an average o f s l i g h t l y over 3 names per chemical i n CIS UDB, as opposed t o s l i g h t l y more than 2 names per chemical i n CHEMLINE ( 9 ) . The CHEMLINE f i l e , which l i n k s p r i m a r i l y t o the TOXLINE l i t e r a t u r e r e f e r e n c e s , i s made up

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

150

RETRIEVAL

OF MEDICINAL CHEMICAL

INFORMATION

mostly of r e s e a r c h chemicals, and thus i s not l i k e l y to have the m u l t i p l e synonyms t h a t are a s s o c i a t e d w i t h commercial chemicals. In the CIS UDB, which i s comprised of f i l e s from p r i m a r i l y regu­ l a t o r y , and hence commercial, sources, there are the expected a d d i t i o n a l names a s s o c i a t e d w i t h m a t e r i a l s i n commerce. To conduct a nomenclature search, the user simply enters a chemical name or name fragment, as shown i n F i g u r e 4. The example shown i n F i g u r e 4 i s of a search f o r any substance i n the UDB whose name contains the fragment "DDT . From F i g u r e 4 i t can be seen that there are 12 such substances i n the UDB, of which the f i r s t , p,p DDT, i s shown i n the F i g u r e . In a d d i t i o n , a l s o shown i n t h i s f i g u r e a r e a l l the f i l e s of the UDB which c o n t a i n i n f o r ­ mation on p,p DDT, w i t h the l o c a l f i l e i d e n t i f i e r numbers l i s t e d so that one may go d i r e c t l y to the p a r t i c u l a r f i l e and get the i n f o r m a t i o n t h a t i s contained i n that f i l e regarding ρ,ρ' DDT. I n F i g u r e 5, a name search f o r the name fragment ''LSD' was performed on the e n t i r e UDB and f i v e examples were found. The f i r s t of these f i v e i s shown i n F i g u r e 5, w i t h the names of the f i l e s that have i n f o r m a t i o n about LSD. Not s u r p r i s i n g l y , the f i l e s i n c l u d e the NIMH L i s t of Psychotropic Drugs, the Merck Index and the NIOSH acute t o x i c i t y data base, as w e l l as the NIH/EPA Mass S p e c t r a l Data Base and the TSCA Candidate L i s t . There i s l i t t l e doubt t h a t the i n c l u s i o n on the TSCA Candidate or "Strawman * l i s t w i l l be changed once the f i n a l TSCA i n v e n t o r y i s p u b l i s h e d , s i n c e under present law, LSD i s an i l l e g a l chemical substance. This i s a use­ f u l search technique, but r e q u i r e s a l a r g e l i s t o f synonyms, a c o r r e c t s p e l l i n g , and a knowledge of how chemical names a r e broken down. For example, i n searching f o r a cyclohexanedione, i f the f i l e name o f the substance i s w r i t t e n as 2,5-cyclohexanedione r a t h e r than cyclohexan-2,5-dione, a search f o r "dione w i l l not f i n d the chemical. 11

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

T

f

1

1

11

F u n c t i o n a l Group - CIDS Key Search (SPROB) The best way t o search f o r f u n c t i o n a l groups o r s t r u c t u r e f e a t u r e s i n the CIS SANSS i s to use the Chemical Information Data Systems (CIDS) keys, developed by Edgewood A r s e n a l . The CIDS keys, a few o f which are shown i n F i g u r e 6, are the b a s i s o f a r a p i d and e f f i c i e n t way to search the CIS UDB f o r substances c o n t a i n i n g a p a r t i c u l a r f u n c t i o n a l group or s t r u c t u r e f e a t u r e . Many of the CIDS keys are q u i t e s p e c i f i c i n nature, as can be seen i n F i g u r e 6. Others, shown towards the bottom of F i g u r e 6, a r e q u i t e generic i n nature. For example, the CIDS key FG25 r e f e r s to the presence of a n i t r i l e or cyanide group i n the molecule. An example o f a CIDS key search i s given i n F i g u r e 7, where a search i s shown f o r a l l c y c l o h e x y l (SCN49) morpholine (SCN35) compounds i n the NIOSH RTECS data base of acute t o x i c i t y . There are o n l y two such compounds i n the data base, and the f i r s t o f these i s p r i n t e d out i n the f i g u r e , along w i t h i t s l o c a l NIOSH RTECS i d e n t i f i e r numbers i n d i c a t e d .

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

HELLER

AND MILNE

OPTION?

NIH/EPA

Chemical

Information

System

NPROBE

FRAGMENT OR WHOLE NAME SEARCH (F/W) (F) ? F S P E C I F Y F R A G M E N T (CR T O EXIT): D D T FILE 1, 12 C O M P O U N D S H A V I N G F R A G M E N T : D D T S P E C I F Y F R A G M E N T (CR T O EXIT): _ OPTION? SSHOW 1 HOW M A N Y S T R U C T U R E S (E T O E X I T ) ? 1_ TYPE Ε T O TERMINATE DISPLAY STRUCTURE 1 C A S R E G I S T R Y N U M B E R 50-29-3

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

T S C A C A N D I D A T E LIST: R000-2373 CIS M A S S S P E C T R O M E T R Y CIS C A R B O N 13 N M R S P E C T R O M E T R Y :

50-29-3.01

EPA PESTICIDES - A C T I V E INGREDIENTS: EPA OHM/TADS:

CAMBRIDGE XRAY CRYSTAL: MERCK

29201

72T16510 50-29-3.01

INDEX

E P A P E S T I C I D E S - A N A L Y T I C A L R E F . S T N D S . : 1880, 1920 E P A S T O R E T : 39317, 39373, 39371,39374, 39372, 39370, 39359, 39375 39376, 39378, 39290, 39358, 39377, 39302, 39303, 39304, 39300, 39301 EPA C H E M I C A L SPILLS CPCSCHEMRIC F D A / E P A PESTICIDES R E F . S T A N D A R D S : 200 U.S. I N T E R N A T I O N A L T R A D E C O M M I S S I O N N B S X R A Y C R Y S T A L : 50-29-3.01 N S F C H E M I C A L S LIST: 138 PHS-149 C A R C I N O G E N S : A0240 NIOSH RTECS: KJ33250 C14H9C15

C

CL*C

C

C

C

C

C**CL

C** **C**C

C

#

C

CL*C**CL

C

CL Benzene, 1, 1, — ( 2 , 2, 2-trichloroethylidene) bis [ 4 - c h l o r o - (9CI) Ethane, 1, 1, 1-trichloro-2, 2-bis (p-chlorophenyl)— (8CI) .alpha., .alpha. —Bis (p-chlorophenyl)—.beta., .beta., .beta, -trichlorethane p, p' -Dichlorodiphenyltrichloroethane p, p' - D D T

Figure 4.

NPROBE

name search for name fragment "DDT*

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

RETRIEVAL

O F MEDICINAL

OPTION? N P R O B E F R A G M E N T O R W H O L E N A M E S E A R C H (F/W) (F) ?F S P E C I F Y F R A G M E N T (CR T O E X I T ) : LSD FILE 5, 5 COMPOUNDS HAVING FRAGMENT: S P E C I F Y F R A G M E N T (CR T O E X I T ) : _ O P T I O N ? SSHOW 5 HOW M A N Y S T R U C T U R E S (E T O EXIT) ? 1 TYPE Ε TO TERMINATE DISPLAY STRUCTURE 1 C A S R E G I S T R Y N U M B E R 50-37-3 T S C A C A N D I D A T E LIST: R000-3157 CIS M A S S S P E C T R O M E T R Y MERCK INDEX NIMH P S Y C H O T R O P I C D R U G S : 273 NIOSH R T E C S : KE42000,KE41000,KE43750

CHEMICAL

INFORMATION

LSD

C20H25N30

C. .C

N**C

*

c. .c

* C++C

C++C •

*





*

*

C**C

ο

c

+



+



C**C *N**C**C #

N**C

Ergoline-8-carboxamide, 9, 10-dtdehydro-N, N-diethyl-6-methyl—, (8.beta.) (9CI) Ergoline-8 .beta, -carboxamide, 9, 10-didehydro-N, N-diethyl-6-methyl- (8CI) (+) - L S D D—LYsergic acid diethylamide D—Lysergic acid Ν , Ν-diethylamide

Figure 5. NPROBE name search for LSD

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

HELLER AND M I L N E

Key

NIH/EPA

Chemical

Information

System

Structure

SCN 1

SCN 35

0 Ο

FG 219

o = p —ο —

Figure6.

Sample CIDS key codes

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

153

154

RETRIEVAL

O F MEDICINAL

CHEMICAL

INFORMATION

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

OPTION? S P R O B E S P E C I F Y S T R U C T U R A L F E A T U R E C O D E A N D P E R M I S S I B L E M U L T I P L I C I T Y LIMITS N E X T S F C = SCN49 FOUND 428 C O M P O U N D S H A V I N G 1 OR M O R E O C C U R R E N C E S O F SCN49

N E X T S F C = SCN35 FOUND 277 C O M P O U N D S H A V I N G

1 OR M O R E O C C U R R E N C E S O F SCN35

NEXT SFC = _ F I L E = 11, OPTION?

2 COMPOUNDS CONTAIN A L L

2 CODES

SSHOW 11

HOW M A N Y S T R U C T U R E S (E T O E X I T ) ? 1 T Y P E Ε T O T E R M I N A T E DISPLAY STRUCTURE 1 C A S R E G I S T R Y N U M B E R 6425-41-8 N I O S H R T E C S : QE06400,QE06700 C10H19NO C

C

C

C

Morpholine, 4 - c y c l o h e x y l Cyclohexylmorpholine N-Cyclohexylmorpholine 4-Cyclohexylmorpholine

Figure 7.

(8CI9CI)

CIDS key search for cyclohexyl morpholine compounds

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

10.

HELLER AND MILNE

NIH/EPA

Chemical

Information

System

155

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

M o l e c u l a r Weight (MW) and Formula (MF) Search In a d d i t i o n to searching f o r a p a r t i c u l a r f u n c t i o n a l group using the CIDS keys as shown above, i t i s p o s s i b l e to search f o r a compound, or a group of compounds, u s i n g molecular weight. The molecular weight search, shown i n F i g u r e 8, a l l o w s f o r e i t h e r a s p e c i f i c molecular weight, o r , as i s i n d i c a t e d i n the f i g u r e , a range o f molecular weights. In the p a r t i c u l a r example shown i n F i g u r e 8, the Merck Index i s being searched f o r a l l occurrences of compounds w i t h a molecular weight between 368 and 380. There are 167 such substances as can be seen i n the top p a r t of F i g u r e 8. T h i s i s too l a r g e a number and so i t was decided to t r y t o narrow or f i l t e r the search down to a smaller number u s i n g a molecular formula search. In t h i s case what was r e a l l y sought were a l l compounds which have two oxygen atoms and a molecular weight between 368 and 380. In F i g u r e 8 a search f o r t h i s p a r t i a l formula (02) i s shown, and t h i s i s f o l l o w e d by a Boolean AND l o g i c o p e r a t i o n (INTERsect) between the f i l e of 167 compounds w i t h the c o r r e c t molecular weight range and the f i l e of 1484 having the c o r r e c t p a r t i a l formula. The r e s u l t of t h i s AND o p e r a t i o n i s a f i l e c o n t a i n i n g the 16 compounds i n the Merck Index which have a molecular weight between 368 and 380 as w e l l as e x a c t l y two oxygen atoms i n the molecule. A t the bottom of F i g u r e 8, the f i r s t of the 16 answers i s p r i n t e d o u t . T h i s compound, w i t h a molecular formula o f C21.H23.C1F.N.02 and a molecular weight of 375, i s H a l o p e r i d o l , which i s a drug used as a s e d a t i v e and t r a n q u i l i z e r . In the event t h a t there i s no i n t e r e s t i n c h l o r i n a t e d compounds, even though they may meet the molecular weight and molecular formula c r i t e r i a , a f u r t h e r molecular formula search may be conducted, as shown i n F i g u r e 9, f o r compounds w i t h 1-4 c h l o r i n e atoms. From F i g u r e 9, i t can be seen that there are 986 compounds w i t h 1-4 c h l o r i n e atoms i n the Merck Index f i l e . Since the requirement was f o r compounds t h a t d i d not c o n t a i n t h i s halogen atom, a Boolean NOT o p e r a t i o n between the 986 c h l o r i n e c o n t a i n i n g compounds and the 16 compounds p r e v i o u s l y found i s performed, as seen i n the center of F i g u r e 9. This r e s u l t s i n the removal o f three of the s i x t e e n substances, and o f the remaining t h i r t e e n , the f i r s t one, Androsta-3,5-dien-17-ol, 3-(cyclopentyloxy)-17-methyl-, (17.beta.), i s p r i n t e d out and shown a t the bottom of F i g u r e 9. T h i s , o f course, l i k e the other twelve i n the f i l e , does not c o n t a i n the c h l o r i n e that was present i n three of the answers to the f i r s t search shown i n F i g u r e 8. The a b i l i t y to i n t e r a c t and impose v a r i o u s l i m i t a t i o n s and f i l t e r s on searching i s a very powerful c a p a b i l i t y of the SANSS.

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

RETRIEVAL

O F MEDICINAL

CHEMICAL

INFORMATION

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

OPTION? ^ TYPE MW OR RANGE, CR TO EXIT USER: 368-380 FILE = 4, 167 COMPOUNDS WITH MW 368-380 OPTION? MF CR TO EXIT, COMPLETE (C), PARTIAL (P), OR RANGED (R) MF? USER:£ THE NUMBER OF ATOM TYPES IS: J _ ENTER ATOM, FOLLOWED BY COUNT FOR EACH TYPE, E.G. C6. TYPE 1 IS: 02 FILE = 5, 1484 COMPOUNDS HAVING PARTIAL MF: 02

CR TO EXIT, COMPLETE (C), PARTIAL (P), OR RANGED (R) MF? USER: OPTION? INTER 4 5 FILE = 6 RESULTING REFERENCES = 16 SOURCE FILES WERE: 4 5 OPTION? SSHOW 6 HOW MANY STRUCTURES (E TO EXIT) ? J _ TYPE Ε TO TERMINATE DISPLAY STRUCTURE 1 C A S REGISTRY NUMBER 52-86-8 MERCK INDEX C21H23CIFN02 CL

C

c

c

C

C

c

F

C

C - Ο

C

C **C" C

1-Butanone, 4-(4-(4-chlorophenyl) -4-hydroxy-1-piperidinyl]-1-(4-fluoro phenyl)- (9CI)

Figure 8. Molecufor-weight range search

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

HELLER

ANDMILNE

NIH/EPA

Chemical

Information

System

OPTION? MF CR TO EXIT, COMPLETE (C), PARTIAL (P), OR RANGED (R) MF? USER: R THE NUMBER OF ATOM TYPES IS: 1 ENTER ATOM, FOLLOWED BY RANGE FOR EACH TYPE, E.G. C6,12. TYPE 1 IS: CL1 4 FILE = 7, 1&6 COMPOUNDS HAVING PARTIAL MF IN RANGE: CL1-4

OPTION? NOT 6 7 FILE = 8 RESULTING REFERENCES = 13 SOURCE FILES WERE: 6 7 OPTION? SSHOW 8 HOW MANY STRUCTURES (E TO EXIT) ? 5. TYPE Ε TO TERMINATE DISPLAY STRUCTURE 1 CAS REGISTRY NUMBER 67-81-2 MERCK INDEX

C25H3802

C ·· Ο

C

C ** C

0««C««C

C

C

C

C

C

C

C

Androsta-3,5-dien-17-ol, 3-(cyclopentyloxy)-17-methyl-, (17.beta.)- (9 CI)

Figure 9.

Sample of combination searches of MF, MW

with NOT logic

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

158

RETRIEVAL

OF

MEDICINAL

CHEMICAL

INFORMATION

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

Nucleus - Ring Search (RPROBE) One of the f e a t u r e s of the CIS SANSS that has made the system u s e f u l i s the s t r u c t u r e of the f i l e w i t h respect to r i n g systems. The SANSS has a h i e r a r c h i c a l f i l e s t r u c t u r e that a l l o w s f o r r a p i d and inexpensive searching f o r s p e c i f i c r i n g s or r i n g systems. In F i g u r e 10, a l i s t of some of the commands used to generate s t r u c t u r e s are given. To show how the SANSS works and how one can use the v a r i o u s query modules, the remainder of the chapter w i l l be devoted to searching through the NIOSH TTECS data base f o r chemic a l s having an aromatic r i n g , s u b s t i t u t e d on ortho carbons w i t h c h l o r i n e and bromine r e s p e c t i v e l y . The f i r s t t h i n g that must be done i n order to perform such a search i s to b u i l d the 'query s t r u c t u r e that i s to be sought. This i s done w i t h the f i r s t few commands shown i n F i g u r e 11. The query s t r u c t u r e i n Figure 11 i s a c h l o r o bromo (ortho) s u b s t i t u t e d benzene r i n g , but the r i n g probe search w i l l be conducted f o r any ortho d i s u b s t i t u t e d a r o matic r i n g , s i n c e i t does not take i n t o account the nature of the s u b s t i t u e n t s . A l s o , s i n c e other s u b s t i t u e n t s on the benzene r i n g w i l l be p e r m i t t e d , i t i s necessary to r e s e t the s u b s t i t u e n t search l e v e l from EXACT (only two s u b s t i t u e n t s and these must be ortho) to 'IMBED (there must be two ortho s u b s t i t u e n t s at a minimum). The command to do t h i s i s EXIM, which i s short f o r EXact/lMbed s w i t c h . The search shown i n Figure 11 r e v e a l s that there are 2715 compounds i n the NIOSH RTECS f i l e that c o n t a i n at l e a s t t h i s r i n g p a t t e r n . To f i l t e r such p o t e n t i a l l y broad responses f u r t h e r , one can use CIDS keys searches and other such c o n s t r a i n t s as shown below. 1

1

1

1

Fragment Search (FPROBE) One f e a t u r e necessary to any s t r u c t u r e search system i s the a b i l i t y to search f o r atom-centered fragments. In a fragment search the user must s p e c i f y an atom and i t s neighbors. The exact (or g e n e r i c ) nature of the bonds between t h i s c e n t r a l atom and each of i t s neighbors i s then entered and a search i s conducted for a l l occurrences of such a fragment. I f a query s t r u c t u r e has already been generated, as was done i n F i g u r e 11, that s t r u c t u r e can be used by the SANSS program to generate and search f o r f r a g ments. There are u s u a l l y a number of atoms i n a query s t r u c t u r e that can be considered as c e n t r a l to a fragment. Hence, a request for a fragment probe of the s u b s t r u c t u r e shown i n F i g u r e 11 would l e a d to searches f o r s i x fragments, four of which would be the same ( i . e . atom centered fragments about atoms 3, 4, 5 and 6 are a l l the same, r e p r e s e n t i n g a carbon atom i n an aromatic r i n g attached to two other aromatic carbon atoms i n the r i n g and a hydrogen). Such fragments are not very s p e c i f i c , and so i t i s best to i d e n t i f y the atom centered fragment f o r which one wishes to search. In F i g u r e 12, atom number 1 i s s e l e c t e d and a search for a l l occurrences of a c h l o r i n e atom on an aromatic r i n g i s

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

HELLER

AND MILNE

NIH/EPA

Chemical

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

COMMAND

Information

System

EFFECT

A A T O M n1 m l

Insert an atom between atom n1 and atom m l .

A B O N D n1 m l

Insert a bond between n1 and m l .

A B R A N 11 at n1

A d d a branch of length 11 at atom n1.

A L I N K n1 11 m l

Insert a chain of length 11 between

A L T B D n1 m l

Define alternate bonds in the smallest

n1 and m l . ring containing n1 and m l as aromatic bonds. A R I N G n1 m l 11

Create a ring of 11 atoms between n1

CHAIN I

Create a chain of I atoms.

and m l . CLEAR

Erase the existing query structure.

C R I N G n1 11

Create a ring of 11 atoms including

DATOM η1

Delete atom η 1.

atom n1. D B O N D n1 m l

Delete the bond joining nl and m l .

MORGA

Renumber the query structure by the

NUC66

Create a structure of two fused

REG

Retrieve the structure corresponding

R EST

Negate the effect of the previous

RIΝ G I

Create a ring of I atoms.

S A T O M n1

Define the elemental nature of atom n1.

S B O N D n1 m l

Define the nature of the bond joining

S P I R O n1 11

Create a spiro-attached ring of

WISBD n1 m l

Define alternate bonds in the smallest

Morgan algorithm. six-membered rings. to a specific registry number. command.

n1 and m l . (11 +1) atoms at n1. ring containing η 1 and m l as double bonds.

Figure 10. Commands used to generate structures for searching

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

RETRIEVAL

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

ENTER NEW SELECTION

O F MEDICINAL

(H F O R HELP):

CHEMICAL

INFORMATION

32

C O L L E C T I O N S E L E C T E D : 32 OPTION? OPTION? RING OPTION? A B R A N 1 AT 1 1 AT 2 OPTION? S A T O M 7 SPECIFY E L E M E N T S Y M B O L =C L OPTION? S A T O M 8 SPECIFY E L E M E N T S Y M B O L = BR OPTION? A L T B D 1 2 OPTION? D 3..4

8BR2

5

1 . . 6 ? ? 7CL OPTION? EXIM SPECIFY S E A R C H L E V E L S T O B EC H A N G E D LEVELS = 4 OPTION? RPROBE C??C ? ? ? ? C

C?? ?

? ?

c??c

?

CONDITIONS O F S E A R C H CHARACTERISTICS T O B E MATCHED TYPE OF MATCH TYPE O F RING O R N U C L E U S EXACT NO HETEROATOMS EXACT SUBSTITUENTS A T 1 2 IMBED T H I S R I N G / N U C L E U S O C C U R S IN 2 7 1 5 COMPOUNDS FILE =

1,

2715

Figure 11.

COMPOUNDS C O N T A I N THIS

A ring-probe (RPROBE) benzene

RING/NUCLEUS

search for a disubstituted

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

10.

HELLER AND MILNE

NIH/EPA

Chemical

Information

System

161

performed. The r e s u l t of t h i s search i s a f i l e c o n t a i n i n g a l l 1618 compounds i n the NIOSH RTECS f i l e that c o n t a i n t h i s p a r t i c u l a r s t r u c t u r e fragment. A f t e r the fragment search i s conducted f o r the c h l o r o aromatic fragment, a s i m i l a r search i s performed on the fragment centered about atom 2, which contains a bromo s u b s t i t u e n t . This fragment probe (FPROBE) search, shown i n F i g u r e 13, r e s u l t s i n 229 occurrences of t h i s fragment i n compounds i n the NIOSH RTECS data base.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

Substructure Search (SUBSS) The Substructure Search o p t i o n i s an atom-by-atom, bond-bybond comparison between connection t a b l e s i n the data base and the connection t a b l e s corresponding to the query s t r u c t u r e . This time consuming, s e q u e n t i a l search i s q u i t e c o s t l y and so the r i n g probe, fragment probe, and other search techniques described above are used as screens to speed up the process and reduce the c o s t . F o l l o w i n g the three separate searches done i n F i g u r e s 11-13, the next step i s to see which compounds i n the NIOSH RTECS data base c o n t a i n occurrences of a l l three. This i s done by a simple Boolean AND l o g i c combination of the three l i s t s of R e g i s t r y Numbers generated by the searches i n these F i g u r e s . The i n t e r s e c t i o n of the l i s t s , performed by the INTER command as shown i n F i g u r e 14, r e s u l t s i n 12 compounds meeting the c r i t e r i a of a l l three searches. However, not n e c e s s a r i l y a l l o f the 12 answers are p r e c i s e l y what i s wanted. This i s because the three searches i n F i g u r e s 11-13 are f o r " p i e c e s " of the s t r u c t u r e sought but the searches do not r e q u i r e these p i e c e s to be i n the same j u x t a p o s i t i o n as i n the query s t r u c t u r e . That i s , the three r e q u i r e ments comprise a necessary, but not s u f f i c i e n t c o n d i t i o n f o r an answer to the o r i g i n a l question. To secure an exact answer as to how many ( i f any) o f these 12 compounds meet the exact query s t r u c t u r e , i t i s necessary to perform a t r u e s u b s t r u c t u r e search (SUBSS) as i s shown i n F i g u r e 14. The r e s u l t o f the use of SUBSS shows t h a t only 7 o f 12 "answers" from the i n t e r s e c t i o n o f the three searches do have the bromine and c h l o r i n e ortho to one another on the benzene r i n g . Of the 7 answers, one i s shown i n F i g u r e 15. As i t turns out from i n s p e c t i o n of a l l 12 p r i o r answers (not shown h e r e ) , the other compounds r e t r i e v e d are meta s u b s t i t u t e d c h l o r o bromo aromatic compounds. Complete S t r u c t u r e Search (IDENT) The f i n a l SANSS module to be d e s c r i b e d i n t h i s chapter i s the search f o r a t o t a l or f u l l s t r u c t u r e , r a t h e r than a subs t r u c t u r e . T h i s module was designed p r i m a r i l y f o r the purpose of searching f o r and r e p o r t i n g s p e c i f i c chemicals as p a r t of the TSCA i n v e n t o r y r e p o r t i n g procedures. The f u l l s t r u c t u r e search, c a l l e d IDENT ( f o r IDENTity), has and w i l l continue to have

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

RETRIEVAL

OPTION?

O F MEDICINAL

CHEMICAL

INFORMATION

FPROBE 1

TYPE Ε T O EXIT F R O M A L L SEARCHES, Τ TO PROCEED T O NEXT FRAGMENT SEARCH FRAGMENT: 7CL????1C

6C

2C

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

R E Q U I R E D O C C U R R E N C E S F O R HIT : 1 T H I S F R A G M E N T O C C U R S IN 1618 C O M P O U N D S F I L E = 2,

1618

COMPOUNDS C O N T A I N THIS F R A G M E N T

Figure 12. A fragment probe (FPROBE) for a chlorine atom attached to an aromatic carbon atom

OPTION?

FPROBE 2

TYPE Ε T O EXIT FROM A L L SEARCHES, Τ TO PROCEED TO NEXT FRAGMENT SEARCH FRAGMENT: 8BR????2C

1C

3C R E Q U I R E D O C C U R R E N C E S F O R HIT : 1 T H I S F R A G M E N T O C C U R S IN 229 C O M P O U N D S F I L E = 3,

229 C O M P O U N D S C O N T A I N T H I S F R A G M E N T

Figure 13. A fragment probe (FPROBE) for a bro­ mine atom attached to an aromatic carbon atom

OPTION?

INTER

FILE = 4 , SOURCE

1

2

FILES WERE:

OPTION?

3

RESULTING REFERENCES = SUBSSS

1

2

12

3

4

DOING SUB-STRUCTURE

SEARCH

T Y P E Ε T O EXIT FILE ITEM 10 S T R U C T U R E B E I N G S E A R C H E D HITS S O F A R 6 FILE = 5 ,

Figure 14.

SUCCESSFUL

SUB STRUCTURES

=

21609905

7

Intersection and substructure search of files de­ rived in Figures 11-13

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

HELLER

AND MILNE

STRUCTURE NIOSH R T E C S :

NIH/EPA

Chemical

7 CAS REGISTRY NUMBER TE70000

Information

System

4824-78-6

C10H12BrC12O3PS

CL * *

C * *

ο

C..C

* * C*****0**P++S * *

BR C #

C..C • * CL

ο * * C * C

Phosphorothioic acid, 0-(4-bromo-2, 5-dichlorophenyl) 0 , 0 — d i e t h y l ester (8CI9CI) Bromophos-ethyl Ethyl bromophos Filariol 60 Nexagan G

Figure 15.

One of seven substructure search hits

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

164

RETRIEVAL

O F MEDICINAL

CHEMICAL

INFORMATION

s p e c i f i c a p p l i c a t i o n t o TSCA a c t i v i t i e s . For example, a f t e r the f i n a l "grandfather i n v e n t o r y r e q u i r e d under s e c t i o n 8 of the Act i s p u b l i s h e d and made a v a i l a b l e , v i a the CIS, as w e l l as by other means, i t w i l l be necessary f o r p o t e n t i a l vendors o f a chemical t o determine i f the chemical they wish t o see o r manuf a c t u r e i s i n the Inventory and can thus be produced and marketed without e x t e n s i v e pre-manufacturing t e s t i n g . Use o f the IDENT search w i l l q u i c k l y r e v e a l i f the chemical i s i n the TSCA i n ventory. Of course, one can use the name search c a p a b i l i t i e s , but there i s no guarantee t h a t the name used by the manufacturer w i l l be i n the l i s t o f synonyms a s s o c i a t e d w i t h the i n v e n t o r y . The s t r u c t u r e shown i n F i g u r e 16 was generated u s i n g the standard SANSS s t r u c t u r e generation commands, such as those l i s t e d i n F i g u r e 10. The IDENT search was then invoked and a f t e r being t o l d that the s t r u c t u r e had the normal number of hydrogen atoms, cons i s t e n t w i t h normal v a l e n c e , i t found the s t r u c t u r e i n the CIS UDB. The s t r u c t u r e was then p r i n t e d out, w i t h a l l the l o c a l f i l e i d e n t i f i e r i n f o r m a t i o n , as w e l l as a number o f synonyms, one of which i s the TSCA C l e r i c a l Code Designation number f o r the substance.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

11

SANSS-Data Base I n t e r f a c e s A s t r u c t u r e or a nomenclature search i s g e n e r a l l y only a means t o an end. The end i s o f t e n some data a s s o c i a t e d w i t h the s t r u c t u r e s found. In order t o f a c i l i t a t e r e t r i e v a l o f such i n f o r m a t i o n , an i n t e r f a c e between the CIS numeric data bases and the SANSS has been constructed. This a l l o w s f o r a search through the UDB f o l l o w e d by a data search (or r e t r i e v a l ) and permits one to answer such queries a s : * Do any ortho bromo-chloro aromatic compounds have a t o x i c i t y greater than 1.0 mg./kg? In the example shown i n Figure 17, the f i r s t three answers from the previous search are used to r e t r i e v e the t o x i c i t y data a s s o c i a t e d w i t h these compounds. The automatic i n t e r f a c e between the systems i s invoked by the command TSHOW and then the previous f i l e o f 7 CAS R e g i s t r y Numbers, generated by SUBSSS, are s p e c i f i e d , w i t h only the f i r s t three being p r i n t e d out upon request. Summary The NIH/EPA CIS has developed to the p o i n t where complex questions can be readilyanswered. The a b i l i t y to manipulate s t r u c t u r e and numeric data and e s t a b l i s h c o r r e l a t i o n s between the two should be o f c o n s i d e r a b l e value t o the EPA i n i t s work under the Toxic Substances C o n t r o l A c t , as w e l l as to s c i e n t i s t s i n g e n e r a l . The value o f the SANSS l i n k e d t o CNMR data has been r e c e n t l y shown (10), and no doubt other s t r u c t u r e - d a t a s t u d i e s

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

HELLER AND MILNE

OPTION?

NIH/EPA

Chemical

Information

System

ρ

10CL

70

?

+

?

+

8CL3??1??2P?50?11 ?

?

?

?

?

?

9 C L 4 0 60 ? ? 12

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

OPTION? TOTAL

IDENT

PROTON COUNT (P F O R P R O G R A M

TOTAL

F O R THIS S T R U C T U R E ESTIMATE)

PROTON COUNT BASED

:

IS

Ρ

UPON N O R M A L CONDITIONS

IS

8

ARE THERE A N Y A B N O R M A L V A L E N C E O R C H A R G E CONDITIONS WOULD

A F F E C T THIS C O U N T

PROTON COUNT FILE

FOR NODE

2

(Y/N) ?

(D T O D I S P L A Y

1 0 , T H I S S T R U C T U R E IS C O N T A I N E D

OPTION?

WHICH

Ν IN

STRUCTURE) ? 1 COMPOUNDS.

S S H O W 10

STRUCTURE

1 CAS REGISTRY

TSCA CANDIDATE EPA PESTICIDES

-

EPA OHM/TADS:

CAMBRIDGE MERCK

LIST:

NUMBER

52-68-6

R001-5032

ACTIVE

INGREDIENTS:

57901

72T16519

XRAY

CRYSTAL:

52-68-6.01

INDEX

EPA PESTICIDES

-

EPA CHEMICAL

SPILLS

FDA/EPA

A N A L Y T I C A L R E F .STNDS.:

PESTICIDES

R E F .S T A N D A R D S :

PHS-149 C A R C I N O G E N S : NIOSH RTECS:

6780

48

C0147

TA07000 C4H8C1304P

CL

Ο

*

+

*

+

CL*C**C**P *0**C #

*

» CL

«

#

#

#

Ο

Ο

C Phosphonic acid,

(2, 2, 2-trichloro-1-hydroxyethyl)-,

dimethyl ester

(8C

I9CI) Agroforotox Anthon Bayer L

13/59

Chlorofos

Figure 16.

Example of IDENT

search for a complete molecule

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

166

RETRIEVAL

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

D A T A B A S E IS NOW

O F MEDICINAL

CHEMICAL

INFORMATION

RTECS

OPTION? R E T R I E V E N U M B E R I N G S Y S T E M ? CAS SOURCE? FILE 5 T H E R E WERE 7 N U M B E R S F O U N D IN F I L E 5 D I S P L A Y HOW M A N Y ? (TYPE Ε T O EXIT) 3 CAS N U M B E R = 2104963 NIOSH N U M B E R = TE71750 O R L - R A T LD50: 1600 MG/K T F X : T X A P A 9 14,515,69 SKN-RBT LD50: 720 MG/K T F X : G U C H A Z 6.54,73 U N K - M A M LD50: 2000 MG/K T F X : 30ZDA9 -,335,71 Phosphorothioic acid, O — (4-bromo-2, 5-dichlorophenyl) O, O-dim ethyl ester (3CI9CI) C8H8BrC1203PS

CAS N U M B E R = 2720174 NIOSH N U M B E R = TB01850 O R L - R A T LD50: 35 MG/K TFX: A R S I M * 20,6,66 O R L - M U S LD50: 77 MG/K T F X : A R S I M * 20,6,66 Phosphonothioic acid, e t h y l - , 0-(4-bromo-2, 5-dichlorophenyl) O-ethyl ester (8CI9CI) C10H12BrC12O2PS

CAS NUMBER = 2720185 NIOSH N U M B E R = TB10700 O R L - R A T LD50: 73 MG/K T F X : A R S I M * 20,6,66 Phosphonothioic acid, methyl—, O—(4-bromo-2, 5-dichlorophenyl O-O-methylethyl) ester (9CI) C10H12BrC12O2PS Figure 17.

Example of NIOSH RTECS

toxicity data retrieval

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

10.

HELLER AND MILNE

NIH/EPA Chemical Information System

w i l l be undertaken now that the necessary groundwork has been laid. Acknowled gement s The authors wish to thank the f o l l o w i n g for t h e i r help and cooperation i n developing the CIS SANSS: R. J . Feldmann, W. G r e e n s t r e e t , M. Yaguda, M. Bracken, A . F e i n , G. Marquart, and J. Miller.

Downloaded by EAST CAROLINA UNIV on January 4, 2018 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch010

Literature Cited 1. 2.

3. lished 4. 5. 6.

7.

8.

9. 10.

H e l l e r , S . R . , M i l n e , G.W.A., and Feldmann, R.J., Science, (1977), 195, 253. Feldmann, R.J., M i l n e , G.W.A., Heller, S . R . , F e i n , Α . , Miller, J . Α . , and Koch, B., J. Chem. Info. and Comp. Sci., (1977), 17, 157. The Interagency Regulatory L i a s o n Group (IRLG) was e s t a b ­ 2 August, 1977 by the f o l l o w i n g four Agencies: EPA, FDA, OSHA and CPSC. EPA Order #2800.2, issued 27 May, 1975. Feldmann, R.J., and Heller, S.R., J. Chem. Doc., (1972), 12, 48. CIDS S t r u c t u r e Feature Key Code Manual is a v a i l a b l e from CIS P r o j e c t , Chemistry Department, Brookhaven N a t i o n a l Laboratory, Upton, Long I s l a n d , New York 11973. NIOSH, R e g i s t r y of Toxic E f f e c t s of Chemical Substances (RTECS), 1977. A v a i l a b l e from the US Government P r i n t i n g O f f i c e , GPO Order Number 017-033-0027101; $17.50 per copy USA: $21.88 per copy non-USA. Bracken, Μ., D o r i g a n , J., Hushon, J., and Overbey, II, J., MITRE Reprt MIR-7558 to CEQ, June 1977. Two volumes en­ titled "Chemical Substances Information Network (CSIN)". NLM Fact Sheet for the T o x i c o l o g y Information Program, January 1978. M i l n e , G . W . A . , Zupan, J., Heller, S . R . , and Miller, J.A., A n a l . Chim. A c t a , In press (1978).

RECEIVED August 29, 1978.

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

167