11
An
Integrated System for C o n d u c t i n g
Biological
Chemical
and
Searches
Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011
T. M . DYOTT, A. M . EDLING, C. R. GARTON, W. O. JOHNSON, P. J. McNULTY, and G. S. ZANDER Rohm and Haas Company, Norristown Road, Spring House,PA19477 Over the past seven years we at Rohm and Haas Company have been developing a computerized chemical and biological information system called ACCIS ( A g r i c u l t u r a l Chemicals Computerized Information System)(1). In this paper we will d e s c r i b e the chemical and biological search capabilities which we have built i n t o ACCIS. ACCIS Design
Criteria
ACCIS was developed in order t o : 1.
Accomodate the growing amount of data which r e s u l t e d from expanding biological s c r e e n i n g programs.
2.
Facilitate communication of screening r e s u l t s to r e s e a r c h e r s , a d m i n i s t r a t o r s , and o u t s i d e collaborators.
3.
Reduce the time our biologists spent t r a n s c r i b ing, e x t r a c t i n g , and r e p o r t i n g screening results.
4.
Enhance the value of the s t o r e d screening r e s u l t s by making them readily a v a i l a b l e .
To meet these o b j e c t i v e s we decided that the system must: 1.
c o n t a i n not only the biological screening r e s u l t s , but a l s o the chemical s t r u c t u r e s , reference d a t a , and p e r t i n e n t chemical d a t a , e.g., solubility and purity information.
2.
produce a v a r i e t y of current awareness r e p o r t s on standard 8 1/2 X 11 paper, or 3 X 5 or 5 X 8 c a r d s , and that those r e p o r t s should c o n t a i n h i g h q u a l i t y structural diagrams whenever a p p r o p r i a t e . 0-8412-0465-9/78/47-084-168$05.00 © 1978 American Chemical Society Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
11.
DYOTT ET
3.
AL.
Chemical
and Biological
Searches
169
provide a convenient mechanism f o r conducting a wide v a r i e t y of chemical and/or b i o l o g i c a l searches.
Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011
System O r g a n i z a t i o n ACCIS i s thoroughly i n t e g r a t e d i n t o a everyday o p e r a t i o n of our screening programs. The flow of i n f o r m a t i o n i n t o ACCIS i s diagrammed i n Figure 1. When our chemists synthesize a compound they complete a compound s u b m i t t a l form, g i v i n g the e m p i r i c a l formula, s t r u c t u r a l diagram, chemical name, chemist's name, notebook r e f e r e n c e , department, date, v a r i o u s p h y s i c a l p r o p e r t i e s , screening p r i o r i t i e s , and any s p e c i a l i n s t r u c t i o n s . The chemist then takes the s u b m i t t a l form and the sample i t s e l f to the Screening Information Center. There the i n f o r m a t i o n i s reviewed and entered i n t o the system v i a a chemical t y p e w r i t e r (a modified IBM MCST). Sub-samples are then weighed out and sent to the appropriate screening area(s) along w i t h a computerproduced t r a n s m i t t a l sheet which provides the b i o l o g i s t s w i t h the s t r u c t u r a l diagram, u s e f u l p h y s i c a l property i n f o r m a t i o n , and any s p e c i a l i n s t r u c t i o n s . The b i o l o g i s t s then screen the compound, r e c o r d i n g t h e i r f i n d i n g s on 2-part carbonless forms. They keep the f i r s t copy as a l e g a l r e c o r d , w h i l e the second copy i s returned to the i n f o r m a t i o n center where the data are keypunched and read i n t o the system. Whenever data are entered, v a r i o u s current awareness r e p o r t s are a u t o m a t i c a l l y generated which keep the chemists, b i o l o g i s t s , and t h e i r management i n formed and a l l o w them to maintain hardcopy f i l e s . A t y p i c a l ACCIS r e p o r t , the h e r b i c i d e current awareness r e p o r t , i s shown i n Figure 2. (The organism names have been replaced by the l e t t e r s B-L f o r c o n f i d e n t i a l i t y reasons.) AM and AD are average c o n t r o l data f o r a l l monocot and a l l d i c o t s p e c i e s , r e s p e c t i v e ly. The number of screening programs f l u c t u a t e s as new programs are i n i t i a t e d and o l d ones are terminated, but i s g e n e r a l l y i n the range of 8-12. Each screen may i n t u r n i n c l u d e anywhere from 1 to 15 d i f f e r e n t organisms, t r e a t e d under v a r i o u s c o n d i t i o n s and dosages. This v a r i a b i l i t y makes i t e s s e n t i a l that the b i o l o g i s t s i n each area work c l o s e l y w i t h the i n f o r m a t i o n s p e c i a l i s t to design both t h e i r data c o l l e c t i o n forms and the v a r i o u s r e p o r t s they r e q u i r e . Our emphasis i s on meeting the researcher's needs r a t h e r than s i m p l i f y i n g the programming. As a r e s u l t ACCIS: 1.
i s a h i g h l y customized
system.
2.
c o n s i s t s of w e l l over 100 programs, t o t a l i n g approximately 250,000 l i n e s of code.
3.
enjoys extremely
strong user
support.
Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
170
RETRIEVAL
OF
MEDICINAL
CHEMICAL
INFORMATION
Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011
BIOLOGISTS
BIOLOGY DATA
BOUND PAGE
I
FORM
Figure 1. Flow of information into ACCIS
Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
FOR LEGAL PURPOSES
Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
1 3
3
1650 N 0S
3
2
2
RATE ( #/A) 8 4 4
RATE ( */A> 8 4 4
7
4
TEST DATE 09/18/74 09/18/74 09/18/74
I O
TYPE TEST PRE PRE PCST
TYPE TEST PRE PRE PCST
TYPE RATE ( */A ) TEST PRE 8 PRE 4 PCST 4
RH 1652 C H CIN 0 S MAYBRIDGE
TEST DATE 09/18/74 09/18/74 09/18/74
s
RH 1651 C|oH F N4 MAYBRIDGE
TEST DATE 09/18/74 09/18/74 09/18/74
MAYBRIDGE
RH C*H
*********•*•*•**•••*•• •COMPANY CONFIDENTIAL* **********************
0
Β 0
D 0
\
F - * - F
F I
/
•—NH—Ν=·—·=Ν
• Ν S Ο · Il II il II J • ·—NH—·—NH—·—·—· \ / I J Κ AM G 0 0 0
ROHM ANO HAAS COMPANY CURRENT AHA«ENESS REPORT HERBICIDE
STA TUS D D 100 D
AD
0
Β 0
0
C
ο ο
ε
Figure 2.
-
ο
0
G
/
A typical ACCIS
AM
\
h
Ο
\
/
J
report format
C
C l
0
Κ 0 0
L 99 -
STA TUS D D D
STA TUS J Κ L F I Ε F AM G C D AD Β 0 40 40 60 7C 0 90 0 90 44 90 100 99 0 30 60 40 30 95 100 32
AD
AREA
Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011
03/23/78
Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011
172
RETRIEVAL
OF
MEDICINAL
CHEMICAL
INFORMATION
The chemical and b i o l o g i c a l i n f o r m a t i o n i n ACCIS i s s t o r e d i n a number of computer f i l e s . The b i o l o g i c a l , m i s c e l laneous chemical, and reference i n f o r m a t i o n i s s t o r e d i n an IMS data base. The s t r u c t u r a l diagram, as entered on the chemical t y p e w r i t e r , and the chemical name are stored i n standard v a r i able record l e n g t h f i l e s . In order to s t o r e the chemical s t r u c tures i n a machine i n t e l l i g i b l e , and t h e r e f o r e searchable, manner we incorporated the Chemical A b s t r a c t s Service (CAS) R e g i s t r y I I system i n t o ACCIS. The s t r u c t u r e s are s t o r e d i n a connection t a b l e f i l e and a fragment f i l e i s generated which improves the e f f i c i e n c y of the substructure search system. In a d d i t i o n there are a number of a u x i l i a r y f i l e s which d e s c r i b e the b i o l o g i c a l screens and are used to v a l i d a t e the b i o l o g i c a l data, a l l o w a b b r e v i a t i o n s i n the data base to be expanded i n r e p o r t s (data d i c t i o n a r i e s ) , and supply d i s t r i b u t i o n l i s t s f o r v a r i o u s r e p o r t s . The t o t a l s i z e of our f i l e s has increased s t e a d i l y s i n c e ACCIS s i n c e p t i o n i n 1973 to approximately 200 m i l l i o n characters. 1
Search C a p a b i l i t i e s We found that i n a d d i t i o n to current awareness r e p o r t s we needed to be able to produce r e p o r t s based on v a r i o u s c r i t e r i a , e.g., s u b s t r u c t u r e , b i o l o g i c a l a c t i v i t y , t e s t date, and/or source. T y p i c a l questions might be: 1.
What 5-halo i s o t h i a z a l o n e s have we made?
2.
What compounds have we screened which c o n t r o l >80% of weed XYZ when a p p l i e d at 2 l b s / a c r e preemergence?
3.
What are the f u n g i c i d e screening r e s u l t s f o r the compounds we obtained from KLM corporation?
4.
What 4 - n i t r o diphenyl-ethers have we made which c o n t r o l >80% of weed RST when a p p l i e d at 4 l b s / a c r e postemergence?
5.
What compounds were screened f o r i n s e c t i c i d a l a c t i v i t y during December 1977?
D i f f e r e n t types of r e p o r t s are a l s o c a l l e d f o r . We might need j u s t the s t r u c t u r e s and reference i n f o r m a t i o n , or s t r u c tures and the screening r e s u l t s from a p a r t i c u l a r area, or s t r u c t u r e s and the screening r e s u l t s from s e v e r a l areas. Since a l l of our common questions are compound o r i e n t e d we designed a modular search system as shown i n Figure 3.
Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978. Figure 3.
BIOLOGICAL SEARCH PROGRAM
CHEMICAL SEARCH PROGRAM
STRUCTURE AND ALL BIOLOGICAL AREAS REPORT PROGRAM
STRUCTURE AND SPECIFIC BIOLOGICALl AREA REPORT PROGRAMS
Flow diagram of modular ACCIS search system
SELECTED OMPDS.
ρ
STRUCTURE REPORT PROGRAM
Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011
RETRIEVAL
174
O F MEDICINAL
CHEMICAL
INFORMATION
A s u i t a b l e chemical search program f o r CAS R e g i s t r y I I f i l e s had already been developed by CAS, w h i l e the v a r i o u s r e p o r t programs are modified v e r s i o n s of current awareness r e p o r t programs we have p r e v i o u s l y developed. The only major new program we needed was one f o r searching the b i o l o g i c a l and r e f e r e n c e i n f o r m a t i o n contained i n the IMS data base.
Downloaded by UNIV OF NEW ENGLAND on January 22, 2017 | http://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch011
Biological
Search
The b i o l o g i c a l data we need t o search i s contained i n an IMS data base, which has a h i e r a r c h i c a l s t r u c t u r e , as shown i n Figure 4. This h i e r a r c h i c a l s t r u c t u r e allows you t o have any number of t e s t areas w i t h i n a compound, any number of t e s t dates w i t h i n a t e s t area, any number of t e s t types w i t h i n a t e s t date, etc. (There i s of course more d e t a i l e d i n f o r m a t i o n w i t h i n each segment of the data base than we have depicted.) We developed a search program which provides a very general search c a p a b i l i t y . I t allows us to q u a l i f y the search or any p i e c e (or pieces) of i n f o r m a t i o n i n the data base and has cons i d e r a b l e Boolean l o g i c c a p a b i l i t i e s . For example, i f we were i n t e r e s t e d i n compounds w i t h i n the range RH-60000 to RH-80000 which were a c t i v e a g a i n s t fungus ABC o r DEF, but d i d not i n j u r e crop XYZ a t a r a t e of 4 l b s / a c r e , we would encode the question as :
(RH>60000*RH