9
Warner-Lambert/Parke-Davis-CAS
Registry III
Integrated I n f o r m a t i o n System
Downloaded by UNIV LAVAL on April 22, 2018 | https://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch009
ROGER D. WESTLAND, RAYMOND L. HOLCOMB, JOHN W. VINSON, JON D. STEELE, ROBERT J. CARDWELL, ROBERT L. SCOTT, THOMAS D. HARKAWAY, PATRICIA J. HYTTINEN, and TINA WILLIAMS Warner-Lambert/Parke-Davis Pharmaceutical Research Division, Ann Arbor, MI 48105 In 1946 the Parke-Davis Research Laboratories c e n t r a l i z e d chemical and biological research data using manual methods o f storage and retrieval. These were e f f e c t i v e until the l a t e 1950's, when manual methods were g r a d u a l l y r e i n f o r c e d w i t h punched card files. By the mid 1960's, machine readable data files were a v a i l a b l e f o r everything except a complete chemical s t r u c t u r e and certain other s t r u c t u r e - r e l a t e d i n f o r m a t i o n . Throughout the development o f computerized i n f o r m a t i o n systems it has been necessary to m a i n t a i n redundant manual files until n e a r l y all i n f o r m a t i o n is computer-readable. Only now, a f t e r adding chemical s t r u c t u r e s to the computer database can we b e g i n to abandon the manual files maintained f o r over 30 y e a r s . I n a d d i t i o n to s t r u c t u r e - h a n d l i n g capability, we have developed a system to link sample i n v e n t o r y and p r o p e r t i e s , b i o l o g i c a l screening d a t a , and research document data to produce reports and answers to q u e r i e s , both interactively and in batch mode. In c o n s i d e r i n g approaches to computerized chemical s t r u c t u r e p r o c e s s i n g (I, 2, 3), we accepted an o f f e r by Chemical A b s t r a c t s S e r v i c e (CAS) to e s t a b l i s h under c o n t r a c t a p r i v a t e s a t e l l i t e o f the CAS R e g i s t r y System (4) which employs over 640 programming modules and over a q u a r t e r - m i l l i o n source statements. Since Warner-Lambert/Parke-Davis (WL/PD) had compatible hardware f o r both p r o c e s s i n g and s t r u c t u r e p r i n t i n g , we were i n a p o s i t i o n to take advantage o f CAS s l a r g e investment i n h i g h q u a l i t y graphics, name p r o c e s s i n g , and computer e d i t s . CAS offered an advanced and h i g h l y developed system which could be i n s t a l l e d i n a short time at r e l a t i v e l y low c o s t . Ongoing development at CAS to enhance the system f o r s t o r i n g , r e t r i e v i n g , and r e p o r t i n g the w o r l d ' s chemical l i t e r a t u r e made c o m p a t i b i l i t y w i t h CAS a t t r a c t i v e . Current use o f CAS s s e r v i c e i n Europe (j>) , Japan ( 6 ) , and the United States (7., 8, 9) evidences i n c r e a s i n g r e l i a n c e on the CAS R e g i s t r y System and suggests the p o s s i b i l i t y o f broad i n d u s t r i a l and governmental use i n the f u t u r e . A p i l o t p r o j e c t at WL/PD r e q u i r e d l e s s than two-man months of e f f o r t to implement CAS*s s t r u c t u r e - p r i n t i n g algorithms from 1
1
0-8412-0465-9/78/47-084-132$05.00 Published 1978 American Chemical Society Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
9.
WESTLAND
E TAL.
Integrated
Information
System
133
the CAS G r a p h i c a l Data S t r u c t u r e (10, JL1, 12) r e c o r d . Success o f the experiment i n p l o t t i n g s t r u c t u r e s o f the type shown i n F i g u r e 1 s t i m u l a t e d f u r t h e r e x p l o r a t i o n which u l t i m a t e l y l e d to the development o f a WL/PD - CAS i n t e g r a t e d system f o r s t o r i n g , r e t r i e v i n g , manipulating, and r e p o r t i n g chemical and b i o l o g i c a l research data.
Downloaded by UNIV LAVAL on April 22, 2018 | https://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch009
System Design With an INQUIRE® (Infodata Systems Inc., F a l l s Church, V i r g i n i a ) database management system a v a i l a b l e on our IBM 370/168 computer, h i s t o r i c a l computer f i l e s o f sample i n v e n t o r y and t r a n s a c t i o n s , p h y s i c a l and chemical p r o p e r t i e s , b i o l o g i c a l screening data, research document data, and other miscellaneous f i l e s were converted to INQUIRE f i l e format (13), and stored on d i s k (Figure 2 ) . Sample t r a n s a c t i o n s (to and from p h y s i c a l storage) and i n v e n t o r y data are entered by i n t e r f a c i n g w i t h the c e n t r a l computer an o n - l i n e balance and a keyboard-CRT t e r m i n a l . Other WL/PD i n f o r m a t i o n i s entered i n a k e y - t o - d i s k o p e r a t i o n using the ENTREX® (14) system, thereby p r o v i d i n g options f o r d i r e c t entry o f data from l a b o r a t o r i e s , when a p p r o p r i a t e . Output from the P r i v a t e R e g i s t r y f i l e s a t CAS i s converted by means o f update programs t o INQUIRE f i l e formats. M u l t i - f i l e searching o f the INQUIRE f i l e s f o r ad hoc queries o r r e p o r t c o n s t r u c t i o n can be done e i t h e r i n t e r a c t i v e l y w i t h TSO terminals o r i n batch mode using a V a r i a n V74 computer as a HASP w o r k - s t a t i o n . Generic s t r u c t u r e searches o f the computer f i l e o f fragment-coded s t r u c t u r e s g i v e as o p t i o n a l output punched paper tape that cont r o l s the d i s p l a y o f s t r u c t u r e images on m i c r o f i c h e . The coded m i c r o f i c h e c o n t a i n i n g 196 s t r u c t u r e images a t 24X r e d u c t i o n are stored i n the c a r o u s e l o f a storage and r e t r i e v a l u n i t manufactured by Image Systems, Inc. Since a new s u b s t r u c t u r e search system f o r the WL/PD f i l e w i l l not be usable u n t i l the e n t i r e backlog o f s t r u c t u r e s has been entered i n t o the P r i v a t e R e g i s t r y , we are cons i d e r i n g programs to a l g o r i t h m i c a l l y generate the Parke-Davis Fragmentation Code (15) from CAS connection t a b l e s . This w i l l a l l o w us to continue using our present search techniques i n the interim. P r o p e r t i e s F i l e . The f o l l o w i n g data are i n c l u d e d i n the keyt o - d i s k entry o f p r o p e r t i e s : a c c e s s i o n number, source, percent of parent component, m e l t i n g o r b o i l i n g p o i n t , s p e c i a l handling or storage requirements, p h y s i c a l s t a t e , s o l u b i l i t y , s t a b i l i t y , s e l e c t e d a n a l y t i c a l and s p e c t r a l data, sample weight and l o c a t i o n , submission date, and l i t e r a t u r e r e f e r e n c e s . Transactions F i l e . A M e t t l e r Model PT320 balance having BCD output, and a CRT t e r m i n a l are i n t e r f a c e d w i t h the c e n t r a l computer through a microprocessor and the V a r i a n HASP w o r k - s t a t i o n . At the time sample weights are a u t o m a t i c a l l y recorded, the
Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
RETRIEVAL
134
O F MEDICINAL
CHEMICAL
INFORMATION
operator keys 1) t r a n s a c t i o n type, 2) a c c e s s i o n number, 3) date, 4) whether the sample i s being r e c e i v e d and from whom, o r being t r a n s m i t t e d and to whom, and 5) storage l o c a t i o n . While t h i s i n f o r m a t i o n i s s t o r e d i n the " T r a n s a c t i o n s " database a running record o f the amount o f sample on hand i s c a l c u l a t e d from onl i n e balance e n t r i e s and s t o r e d i n the " P r o p e r t i e s " database.
Downloaded by UNIV LAVAL on April 22, 2018 | https://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch009
B i o l o g y F i l e . Screening data from b i o l o g y l a b o r a t o r i e s a r e recorded on data e n t r y forms a p p r o p r i a t e l y coded (13) f o r key-tod i s k h a n d l i n g , e i t h e r i n a c e n t r a l l o c a t i o n o r the l a b o r a t o r y itself. R e s u l t forms are customized f o r each t e s t and a r e rearranged i n t o a standard format by the ENTREX processor before being sent to the main computer. Document F i l e . Search parameters o f i n t e r n a l l y generated r e s e a r c h r e p o r t s are i n c l u d e d i n the Document F i l e (16). Text (word) p r o c e s s i n g equipment, soon to be a c q u i r e d , w i l l permit inexpensive r e c o r d i n g o f s e l e c t e d t e x t such as a b s t r a c t s . A v a r i e t y o f o p t i o n s to INQUIRE i n c l u d e techniques which can index and r e t r i e v e on the b a s i s o f such t e x t . The m u l t i - f i l e o p t i o n allows s e l e c t e d records t o be combined w i t h data from other INQUIRE f i l e s . CAS F i l e s . Machine p r o c e s s i n g o f data must be performed a t CAS to take advantage o f the many machine v a l i d a t i n g and d u p l i cate checking f e a t u r e s o f the CAS R e g i s t r y System. Although s t r u c t u r e s and chemical names could be entered a t the user's l o c a t i o n followed by t r a n s m i t t a l o f computer-readable data t o CAS f o r p r o c e s s i n g , CAS s keyboarding conventions and h i g h volume a l l o w them to o f f e r the s e r v i c e a t a cheaper r a t e than we could match i n t e r n a l l y . A c c o r d i n g l y , data sheets o f chemical s t r u c t u r e s and names are shipped to CAS on a twice-weekly b a s i s ( F i g u r e 3). A t CAS the hand-written i n f o r m a t i o n i s checked and e d i t e d , and s t r u c t u r e s , s t e r e o - d e s c r i p t o r s , and names are entered by a k e y - t o - d i s k procedure (17) . Keyboarded records o f s t r u c tures are processed i n the P r i v a t e R e g i s t r y s a t e l l i t e system w i t h the use o f most o f the computer e d i t s o f the CAS R e g i s t r y System (17). A d i s t i n g u i s h i n g f e a t u r e o f t h i s process i s a check t o determine i f the newly entered s t r u c t u r e a l s o e x i s t s i n the R e g i s t r y f i l e o f over four m i l l i o n substances. I f an exact d u p l i cate i s found i n the CAS f i l e , the CAS R e g i s t r y Number along w i t h the CA Index name and synonyms are returned as an update t o the WL/PD Names F i l e . C r i t i c a l to the d u p l i c a t e check as c u r r e n t l y handled i s that the e n t i r e s t r u c t u r e , i n c l u d i n g the s a l t or s o l vate p o r t i o n , must be i d e n t i c a l even as to the p r o p o r t i o n o f components o f a multi-component s t r u c t u r e (e.g., RNI^'I^SC^ does not match RNH2*1/2H2S04. System m o d i f i c a t i o n s could remove the l i m i t a t i o n . A p r o f i l e o f a l l WL/PD substances entered i n t o the p r i v a t e WL/PD system i s maintained by CAS and checked p e r i o d i c a l l y f o r matches i n the CAS R e g i s t r y f i l e s . Therefore, w i t h i n 1
Howe et al.; Retrieval of Medicinal Chemical Information ACS Symposium Series; American Chemical Society: Washington, DC, 1978.
WESTLAND E T AL.
Integrated Information System
Downloaded by UNIV LAVAL on April 22, 2018 | https://pubs.acs.org Publication Date: December 14, 1978 | doi: 10.1021/bk-1978-0084.ch009
OH
Figure 1. Plotted structure of chalcomycin (CAS Registry Number 20283-48-1). Stereochemistry is provided by a "text descriptor" which is printed along with the topological representation shown. DOCUMENTS
BIOLOGY
BIOLOGY
INVENTORY INQUIRE DATABASES
DATA ENTRY
BIODATA
PROPERTIES f
CONVERSION