An Integrated System for Conducting Chemical and Biological Searches

standard 8 1/2 X 11 paper, or 3 X 5 or 5 X 8 cards, and that ... CONFIDENTIAL ... 8. R. H. 165. 0. C*H. 13. N. 30S. 2. MAYBRIDG. E. TES. T. DAT. E. 09...
1 downloads 0 Views 702KB Size
11

An

Integrated System for C o n d u c t i n g

Biological

Chemical

and

Searches

Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011

T. M . DYOTT, A. M . EDLING, C. R. GARTON, W. O. JOHNSON, P. J. McNULTY, and G. S. ZANDER Rohm and Haas Company, Norristown Road, Spring House,PA19477 Over the past seven years we at Rohm and Haas Company have been developing a computerized chemical and biological information system called ACCIS ( A g r i c u l t u r a l Chemicals Computerized Information System)(1). In this paper we will d e s c r i b e the chemical and biological search capabilities which we have built i n t o ACCIS. ACCIS Design

Criteria

ACCIS was developed in order t o : 1.

Accomodate the growing amount of data which r e s u l t e d from expanding biological s c r e e n i n g programs.

2.

Facilitate communication of screening r e s u l t s to r e s e a r c h e r s , a d m i n i s t r a t o r s , and o u t s i d e collaborators.

3.

Reduce the time our biologists spent t r a n s c r i b ing, e x t r a c t i n g , and r e p o r t i n g screening results.

4.

Enhance the value of the s t o r e d screening r e s u l t s by making them readily a v a i l a b l e .

To meet these o b j e c t i v e s we decided that the system must: 1.

c o n t a i n not only the biological screening r e s u l t s , but a l s o the chemical s t r u c t u r e s , reference d a t a , and p e r t i n e n t chemical d a t a , e.g., solubility and purity information.

2.

produce a v a r i e t y of current awareness r e p o r t s on standard 8 1/2 X 11 paper, or 3 X 5 or 5 X 8 c a r d s , and that those r e p o r t s should c o n t a i n h i g h q u a l i t y structural diagrams whenever a p p r o p r i a t e . 0-8412-0465-9/78/47-084-168$05.00 © 1978 American Chemical Society Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

11.

DYOTT ET

3.

AL.

Chemical

and Biological

Searches

169

provide a convenient mechanism f o r conducting a wide v a r i e t y of chemical and/or b i o l o g i c a l searches.

Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011

System O r g a n i z a t i o n ACCIS i s thoroughly i n t e g r a t e d i n t o a everyday o p e r a t i o n of our screening programs. The flow of i n f o r m a t i o n i n t o ACCIS i s diagrammed i n Figure 1. When our chemists synthesize a compound they complete a compound s u b m i t t a l form, g i v i n g the e m p i r i c a l formula, s t r u c t u r a l diagram, chemical name, chemist's name, notebook r e f e r e n c e , department, date, v a r i o u s p h y s i c a l p r o p e r t i e s , screening p r i o r i t i e s , and any s p e c i a l i n s t r u c t i o n s . The chemist then takes the s u b m i t t a l form and the sample i t s e l f to the Screening Information Center. There the i n f o r m a t i o n i s reviewed and entered i n t o the system v i a a chemical t y p e w r i t e r (a modified IBM MCST). Sub-samples are then weighed out and sent to the appropriate screening area(s) along w i t h a computerproduced t r a n s m i t t a l sheet which provides the b i o l o g i s t s w i t h the s t r u c t u r a l diagram, u s e f u l p h y s i c a l property i n f o r m a t i o n , and any s p e c i a l i n s t r u c t i o n s . The b i o l o g i s t s then screen the compound, r e c o r d i n g t h e i r f i n d i n g s on 2-part carbonless forms. They keep the f i r s t copy as a l e g a l r e c o r d , w h i l e the second copy i s returned to the i n f o r m a t i o n center where the data are keypunched and read i n t o the system. Whenever data are entered, v a r i o u s current awareness r e p o r t s are a u t o m a t i c a l l y generated which keep the chemists, b i o l o g i s t s , and t h e i r management i n formed and a l l o w them to maintain hardcopy f i l e s . A t y p i c a l ACCIS r e p o r t , the h e r b i c i d e current awareness r e p o r t , i s shown i n Figure 2. (The organism names have been replaced by the l e t t e r s B-L f o r c o n f i d e n t i a l i t y reasons.) AM and AD are average c o n t r o l data f o r a l l monocot and a l l d i c o t s p e c i e s , r e s p e c t i v e ly. The number of screening programs f l u c t u a t e s as new programs are i n i t i a t e d and o l d ones are terminated, but i s g e n e r a l l y i n the range of 8-12. Each screen may i n t u r n i n c l u d e anywhere from 1 to 15 d i f f e r e n t organisms, t r e a t e d under v a r i o u s c o n d i t i o n s and dosages. This v a r i a b i l i t y makes i t e s s e n t i a l that the b i o l o g i s t s i n each area work c l o s e l y w i t h the i n f o r m a t i o n s p e c i a l i s t to design both t h e i r data c o l l e c t i o n forms and the v a r i o u s r e p o r t s they r e q u i r e . Our emphasis i s on meeting the researcher's needs r a t h e r than s i m p l i f y i n g the programming. As a r e s u l t ACCIS: 1.

i s a h i g h l y customized

system.

2.

c o n s i s t s of w e l l over 100 programs, t o t a l i n g approximately 250,000 l i n e s of code.

3.

enjoys extremely

strong user

support.

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

170

RETRIEVAL

OF

MEDICINAL

CHEMICAL

INFORMATION

Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011

BIOLOGISTS

BIOLOGY DATA

BOUND PAGE

I

FORM

Figure 1. Flow of information into ACCIS

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

FOR LEGAL PURPOSES

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

1 3

3

1650 N 0S

3

2

2

RATE ( #/A) 8 4 4

RATE ( */A> 8 4 4

7

4

TEST DATE 09/18/74 09/18/74 09/18/74

I O

TYPE TEST PRE PRE PCST

TYPE TEST PRE PRE PCST

TYPE RATE ( */A ) TEST PRE 8 PRE 4 PCST 4

RH 1652 C H CIN 0 S MAYBRIDGE

TEST DATE 09/18/74 09/18/74 09/18/74

s

RH 1651 C|oH F N4 MAYBRIDGE

TEST DATE 09/18/74 09/18/74 09/18/74

MAYBRIDGE

RH C*H

*********•*•*•**•••*•• •COMPANY CONFIDENTIAL* **********************

0

Β 0

D 0

\

F - * - F

F I

/

•—NH—Ν=·—·=Ν

• Ν S Ο · Il II il II J • ·—NH—·—NH—·—·—· \ / I J Κ AM G 0 0 0

ROHM ANO HAAS COMPANY CURRENT AHA«ENESS REPORT HERBICIDE

STA­ TUS D D 100 D

AD

0

Β 0

0

C

ο ο

ε

Figure 2.

-

ο

0

G

/

A typical ACCIS

AM

\

h

Ο

\

/

J

report format

C

C l

0

Κ 0 0

L 99 -

STA­ TUS D D D

STA­ TUS J Κ L F I Ε F AM G C D AD Β 0 40 40 60 7C 0 90 0 90 44 90 100 99 0 30 60 40 30 95 100 32

AD

AREA

Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011

03/23/78

Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011

172

RETRIEVAL

OF

MEDICINAL

CHEMICAL

INFORMATION

The chemical and b i o l o g i c a l i n f o r m a t i o n i n ACCIS i s s t o r e d i n a number of computer f i l e s . The b i o l o g i c a l , m i s c e l laneous chemical, and reference i n f o r m a t i o n i s s t o r e d i n an IMS data base. The s t r u c t u r a l diagram, as entered on the chemical t y p e w r i t e r , and the chemical name are stored i n standard v a r i able record l e n g t h f i l e s . In order to s t o r e the chemical s t r u c tures i n a machine i n t e l l i g i b l e , and t h e r e f o r e searchable, manner we incorporated the Chemical A b s t r a c t s Service (CAS) R e g i s t r y I I system i n t o ACCIS. The s t r u c t u r e s are s t o r e d i n a connection t a b l e f i l e and a fragment f i l e i s generated which improves the e f f i c i e n c y of the substructure search system. In a d d i t i o n there are a number of a u x i l i a r y f i l e s which d e s c r i b e the b i o l o g i c a l screens and are used to v a l i d a t e the b i o l o g i c a l data, a l l o w a b b r e v i a t i o n s i n the data base to be expanded i n r e p o r t s (data d i c t i o n a r i e s ) , and supply d i s t r i b u t i o n l i s t s f o r v a r i o u s r e p o r t s . The t o t a l s i z e of our f i l e s has increased s t e a d i l y s i n c e ACCIS s i n c e p t i o n i n 1973 to approximately 200 m i l l i o n characters. 1

Search C a p a b i l i t i e s We found that i n a d d i t i o n to current awareness r e p o r t s we needed to be able to produce r e p o r t s based on v a r i o u s c r i t e r i a , e.g., s u b s t r u c t u r e , b i o l o g i c a l a c t i v i t y , t e s t date, and/or source. T y p i c a l questions might be: 1.

What 5-halo i s o t h i a z a l o n e s have we made?

2.

What compounds have we screened which c o n t r o l >80% of weed XYZ when a p p l i e d at 2 l b s / a c r e preemergence?

3.

What are the f u n g i c i d e screening r e s u l t s f o r the compounds we obtained from KLM corporation?

4.

What 4 - n i t r o diphenyl-ethers have we made which c o n t r o l >80% of weed RST when a p p l i e d at 4 l b s / a c r e postemergence?

5.

What compounds were screened f o r i n s e c t i c i d a l a c t i v i t y during December 1977?

D i f f e r e n t types of r e p o r t s are a l s o c a l l e d f o r . We might need j u s t the s t r u c t u r e s and reference i n f o r m a t i o n , or s t r u c tures and the screening r e s u l t s from a p a r t i c u l a r area, or s t r u c t u r e s and the screening r e s u l t s from s e v e r a l areas. Since a l l of our common questions are compound o r i e n t e d we designed a modular search system as shown i n Figure 3.

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.

Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978. Figure 3.

BIOLOGICAL SEARCH PROGRAM

CHEMICAL SEARCH PROGRAM

STRUCTURE AND ALL BIOLOGICAL AREAS REPORT PROGRAM

STRUCTURE AND SPECIFIC BIOLOGICALl AREA REPORT PROGRAMS

Flow diagram of modular ACCIS search system

SELECTED OMPDS.

ρ

STRUCTURE REPORT PROGRAM

Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011

RETRIEVAL

174

O F MEDICINAL

CHEMICAL

INFORMATION

A s u i t a b l e chemical search program f o r CAS R e g i s t r y I I f i l e s had already been developed by CAS, w h i l e the v a r i o u s r e p o r t programs are modified v e r s i o n s of current awareness r e p o r t programs we have p r e v i o u s l y developed. The only major new program we needed was one f o r searching the b i o l o g i c a l and r e f e r e n c e i n f o r m a t i o n contained i n the IMS data base.

Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011

Biological

Search

The b i o l o g i c a l data we need t o search i s contained i n an IMS data base, which has a h i e r a r c h i c a l s t r u c t u r e , as shown i n Figure 4. This h i e r a r c h i c a l s t r u c t u r e allows you t o have any number of t e s t areas w i t h i n a compound, any number of t e s t dates w i t h i n a t e s t area, any number of t e s t types w i t h i n a t e s t date, etc. (There i s of course more d e t a i l e d i n f o r m a t i o n w i t h i n each segment of the data base than we have depicted.) We developed a search program which provides a very general search c a p a b i l i t y . I t allows us to q u a l i f y the search or any p i e c e (or pieces) of i n f o r m a t i o n i n the data base and has cons i d e r a b l e Boolean l o g i c c a p a b i l i t i e s . For example, i f we were i n t e r e s t e d i n compounds w i t h i n the range RH-60000 to RH-80000 which were a c t i v e a g a i n s t fungus ABC o r DEF, but d i d not i n j u r e crop XYZ a t a r a t e of 4 l b s / a c r e , we would encode the question as :

(RH>60000*RH