STN Implementation of Factual and Structure Databases - ACS

Aug 17, 1990 - Since December 1988, the first part of the Beilstein database has been available online through STN International. This file covers a b...
0 downloads 3 Views 1MB Size
Chapter 3

STN Implementation of Factual and Structure Databases Andreas Barth

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

STN International, FIZ Karlsruhe, D-7514 Eggenstein-Leopoldshafen 2, Federal Republic of Germany

Since December 1988, the first part of the Beilstein database has been available online through STN International. This file covers a broad scope of chemical and physical information from 1830 to date. In this paper an overview of the design and implementation of this file on STN will be presented. The design of the database is briefly discussed. Following, a description of Chemical Substance Identification, Chemical Reaction Information, Physical Property Data, General Fields and Bibliographic Data is given. It is illustrated that the implementation of the database on STN allows for rather sophisticated searches of various properties, including chemical reaction information.

The f i r s t p a r t o f t h e B e i l s t e i n d a t a b a s e o f o r g a n i c s u b s t a n c e s was i n t r o d u c e d i n December 1988 o n STN International. It i s the f i r s t time that a factual d a t a b a s e c o n t a i n i n g s u c h a l a r g e number o f p h y s i c a l a n d c h e m i c a l e n t i t i e s h a s become p u b l i c l y a v a i l a b l e a s a n o n l i n e database. I n i t i a l l y , the database contained the s t r u c t u r e s and f a c t u a l d a t a o f a p p r o x i m a t e l y 350,000 heterocyclic substances from the printed handbook, c o v e r i n g t h e t i m e s p a n f r o m 1830 t o 1 9 5 9 . S i n c e t h e n t h e number o f s u b s t a n c e s h a s i n c r e a s e d s u b s t a n t i a l l y a n d i s expected to reach approximately 3.5 mio. organic

0097-6156/90A)436-0024$06.00A) © 1990 American Chemical Society

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

3. BARTH

577V Implementation ofDatabases

25

compounds b y M a r c h 1 9 9 0 . The d a t a b a s e w i l l t h e n c o n t a i n updates t h r o u g h 1980. I n t h e meantime t h e B e i l s t e i n I n s t i t u t e h a s b e g u n t h e e x t r a c t i o n o f t h e p r i m a r y l i t e r a t u r e s t a r t i n g from 1980. This data is directly input into personal computers using menu-driven excerption programs d e v e l o p e d b y t h e B e i l s t e i n I n s t i t u t e a n d S o f t r o n GmbH. A f t e r a s h o r t e r r o r c h e c k i n g and r e v i e w i n g p r o c e s s , the data will be loaded immediately into the online d a t a b a s e . W i t h i n a few y e a r s t h e B e i l s t e i n d a t a b a s e w i l l i n c l u d e a l l t h e e x c e r p t s from t h e c u r r e n t l i t e r a t u r e and w i l l be a l m o s t u p - t o - d a t e ( i . e t h e d a t a w i l l be l o a d e d i n t o t h e d a t a b a s e w i t h i n l e s s t h a n one y e a r ) . There are four d i f f e r e n t sources c o n t r i b u t i n g t o t h e o n l i n e d a t a b a s e (see a l s o C h a p t e r 2 ) : - B e i l s t e i n handbook ( c u r r e n t l y up t o 1 9 5 9 ) , - L i t e r a t u r e e x c e r p t s on f i l e c a r d s (1960 - 1 9 7 9 ) , - B e i l s t e i n handbook (1959 - 1 9 7 9 , i n p r i n t ) , - L i t e r a t u r e e x c e r p t s i n m a c h i n e - r e a d a b l e form (from 1980). O n l y t h e d a t a f r o m t h e handbook i s critically r e v i e w e d . I n t h e database t h i s i s i n d i c a t e d by t h e n o t e 'Handbook Data'. The first source for the online d a t a b a s e i s t h e famous B e i l s t e i n Handbook o f O r g a n i c Chemistry, the largest c o l l e c t i o n of c r i t i c a l l y reviewed d a t a o f o r g a n i c c h e m i s t r y . I t was p u b l i s h e d f o r the f i r s t t i m e i n 1918 and i s now a v a i l a b l e i n t h e 4th edition, consisting of the Basic Series and four Supplementary S e r i e s . T h i s covers the complete p u b l i s h e d l i t e r a t u r e on o r g a n i c c h e m i s t r y t h r o u g h 1959. R e c e n t l y , t h e f i r s t volumes o f t h e f i f t h Supplementary S e r i e s have b e e n d e l i v e r e d , and t h e c o m p l e t e s e t o f v o l u m e s w i l l be continuously p r i n t e d during the next decade(s). Each o r g a n i c s u b s t a n c e i s i d e n t i f i e d b y a c h e m i c a l name g i v e n i n I U P A C - o r i e n t e d nomenclature and a s t r u c t u r e d i a g r a m . In a d d i t i o n , a l a r g e set of p h y s i c a l p r o p e r t i e s and chemical information i s described together with the corresponding literature references. The scope of information may c o v e r (1): substance identification i n f o r m a t i o n , s y n t h e s i s and r e a c t i o n d a t a , s t r u c t u r e a n d energy parameters, state of aggregation, mechanical properties, thermodynamic data, transport phenomena, o p t i c a l a n d s p e c t r a l d a t a , m a g n e t i c and e l e c t r i c a l d a t a , electrochemical behaviour, and m u l t i - c o m p o n e n t system d a t a . I n a n a l o g y t o t h e p r i n t e d Handbook, t h e d o c u m e n t s i n the B e i l s t e i n database are substance-oriented (i.e. all factual information i s associated with a welld e f i n e d c h e m i c a l s u b s t a n c e and s t r u c t u r e ) .

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

26

THE BEILSTEIN ONLINE DATABASE

F o r t h e i n d i v i d u a l c h e m i c a l s u b s t a n c e s , t h e number o f a s s o c i a t e d f a c t u a l d a t a may v a r y s i g n i f i c a n t l y . The minimum i n f o r m a t i o n c o r r e s p o n d i n g t o a s u b s t a n c e c o m p r i s e s the identification data a n d one p h y s i c a l or chemical entity. For p y r i d i n e , currently the most comprehensive substance, a l l f a c t u a l data are a v a i l a b l e and a n o f f l i n e p r i n t c o n s i s t s o f more t h a n 540 p a g e s . With each f a c t u a l data f i e l d , there i s at least one l i t e r a t u r e r e f e r e n c e and sometimes an a d d i t i o n a l n o t e g i v i n g f u r t h e r i n f o r m a t i o n . S t a t i s t i c s on t h e number o f substances p e r e n t i t y i s g i v e n i n F i g u r e 1 as a b a r chart. T h e s e s t a t i s t i c s a r e t a k e n from t h e current database o f 1,745,686 substances w i t h 460,846 r e c o r d s from the handbook file (heterocyclic and acyclic s u b s t a n c e s ) a n d 1 , 2 8 4 , 8 4 0 r e c o r d s from t h e e x c e r p t f i l e ( h e t e r o c y c l i c s u b s t a n c e s ) . I t c a n be s e e n from this f i g u r e t h a t the major p a r t o f f a c t u a l d a t a i s comprised o f p r e p a r a t i o n d a t a ( P R E , 8 7 . 4 % ) , r e a c t i o n d a t a (REA, 1 0 . 2 %) , m e l t i n g p o i n t (MP, 6 8 . 2 % ) , a n d b o i l i n g p o i n t ( B P , 12 % ) . W h i l e t h i s may be a r e s u l t o f t h e c o n t e n t o f the current f i l e , i t i s not expected t h a t these f i g u r e s w i l l v a r y s i g n i f i c a n t l y when t h e f i l e c o n t a i n s a more b a l a n c e d s e t o f s u b s t a n c e i n f o r m a t i o n from t h e h a n d b o o k and t h e e x c e r p t s . Design of the

Database

According to the data structure of the Beilstein database, t h e r e a r e s e v e r a l hundred s e a r c h f i e l d s and more t h a n 140 d i f f e r e n t d i s p l a y f o r m a t s ( 2 , 2 ) . A s shown i n t h e f i r s t s e c t i o n , t h e t y p e and amount o f i n f o r m a t i o n w h i c h i s a v a i l a b l e f o r a p a r t i c u l a r s u b s t a n c e may v a r y s i g n i f i c a n t l y . To o b t a i n a n o v e r v i e w o f t h e a v a i l a b l e fields f o r a g i v e n s u b s t a n c e t h e d i s p l a y f o r m a t FA ( F i e l d A v a i l a b i l i t y ) c a n be u s e d . I n F i g u r e 2 , t h e t a b l e o f c o n t e n t f o r t h e s u b s t a n c e t r y p t o p h a n e i s s h o w n . The Messenger command l a n g u a g e used here is described elsewhere (4). I n t h e f i r s t column o f t h i s t a b l e t h e d i s p l a y f o r m a t s a r e g i v e n , t h e f u l l name i s shown i n t h e s e c o n d c o l u m n a n d t h e number o f o c c u r r e n c e s i s d i s p l a y e d i n t h e l a s t c o l u m n . I n t h i s c a s e , we f i n d t h a t t h e r e a r e 2 o c c u r r e n c e s o f p r e p a r a t i o n . The number o f o c c u r r e n c e s i s a direct indication of the number of different p r e p a r a t i o n methods f o r t h e s u b s t a n c e . To d i s p l a y t h e d a t a f o r t h e s u b s t a n c e , one may s i m p l y u s e t h e c o d e s from t h e f i r s t c o l u m n o f F i g u r e 2 . A d i s p l a y o f IDE ( I d e n t i f i c a t i o n o f S u b s t a n c e i n c l u d i n g BRN + CN + MF + SO + FW + LN + STR) a n d PRE ( P r e p a r a t i o n ) i s shown i n

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

3. BARTH

27

STN Implementation ofDatabases

Figure 1. Percentage of database records containing certain properties.

=> SEARCH tryptophan/CN L9

1 TRYPTOPHAN/CN

=> DISPLAY FA L9 ANSWER 1 OF 1 Code Field Name MF CN FW SO LN ΝΤΕ PRE MP REA RSTR INP CPD ORP

Occur.

Molecular Formula 1 Chemical Name 1 Formula Weight 1 Beilstein Citation 1 Lawson Number 1 Notes 1 Preparation 2 Melting Point 6 Chemical Reaction 15 Related Structure 3 Isolation from Natural Product 6 Crystal Property Description 2 Optical Rotatory Power 4

Figure 2. Table of Contents for Tryptophane.

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

28

THE BEILSTEIN ONLINE DATABASE

F i g u r e 3 . I t s h o u l d be n o t e d t h a t any c o m b i n a t i o n o f formats i n c l u d i n g combined (predefined) and custom formats i s a l l o w e d . For a deeper understanding of the Beilstein database, i t i s necessary to o u t l i n e the s t r u c t u r e o f a B e i l s t e i n document. There a r e e s s e n t i a l l y f o u r d i f f e r e n t i n f o r m a t i o n l e v e l s ( s e e F i g u r e 4) . A l l t h e substance i d e n t i f i c a t i o n information comprises the f i r s t l e v e l . I t is actually associated with a registered Beilstein compound ( t i t l e compound). On t h e s e c o n d l e v e l , one finds the a v a i l a b i l i t y information. This comprises the search fields Field Availability (FA), Property H i e r a r c h y ( P H ) , C o n t r o l l e d Terms ( C T ) , a n d C o n t r o l l e d Terms o f M u l t i - C o m p o n e n t S y s t e m s (CTM). The c o n t e n t o f these f i e l d s i n d i c a t e s whether there i s information available for a specific property. The factual information, i.e. numeric values of properties and reaction information, are found on t h e third level ( m e a s u r e m e n t ) . On t h e f o u r t h l e v e l t h e b i b l i o g r a p h i c i n f o r m a t i o n i s g i v e n r e f e r r i n g t o t h e measurements o f the next higher l e v e l . There i s a one-to-many r e l a t i o n s h i p between the h i g h e r and t h e n e x t l o w e r l e v e l i n t h i s h i e r a r c h y . T h i s means t h a t f o r a g i v e n s u b s t a n c e , t h e r e c o u l d be many p r o p e r t i e s a v a i l a b l e , f o r each p r o p e r t y t h e r e c o u l d be s e v e r a l m e a s u r e m e n t s , and f o r a measurement t h e r e c o u l d be more t h a n one c i t a t i o n . A s e a r c h c o u l d be p e r f o r m e d i n f i e l d s o f d i f f e r e n t l e v e l s . I n any c a s e , t h e a n s w e r s e t w i l l c o n s i s t o f B e i l s t e i n R e g i s t r y Numbers (BRN) p l u s some a d d i t i o n a l i n f o r m a t i o n . T h i s means t h a t t h e r e s u l t o f a s e a r c h i s a l w a y s a B e i l s t e i n t i t l e compound. The information about the search level can be reconstructed from t h e a d d i t i o n a l i n f o r m a t i o n i n the a n s w e r s e t and i s u s e d f o r t h e v a r i o u s d i s p l a y f o r m a t s . A c e r t a i n v a l u e o f the ( P ) - p r o x i m i t y i s assigned t o each measurement (instance of data) and the individual references are associated with a particular (S)proximity value. A l l i n f o r m a t i o n o f t h e f i r s t two l e v e l s i s i n d e x e d i n s t a n d a r d s e a r c h f i e l d s , e . g . CN ( C h e m i c a l Name) o r NF ( M o l e c u l a r F o r m u l a ) . The f a c t u a l d a t a h a s a s l i g h t l y d i f f e r e n t s t r u c t u r e s i n c e the main e n t i t y i s g e n e r a l l y d e p e n d i n g on f u r t h e r p a r a m e t e r s . A s a n e x a m p l e , t h e d a t a s t r u c t u r e o f D i p o l e Moment i s shown i n F i g u r e 5 . H e r e , the main entity is depending upon the parameters Temperature, M e t h o d , and S o l v e n t . The v a l u e s f o r the D i p o l e Moment and t h e c o r r e s p o n d i n g Temperature are g i v e n i n Debye and d e g r e e C e l s i u s , r e s p e c t i v e l y . The (P)-operator a l l o w s t o perform a search o f the D i p o l e

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

3. BARTH

STN Implementation ofDatabases

29

-> d i s p l a y i d e p r e LI BRN MF CN FW SO ΝΤΕ LN

ANSWER 1 OF 1 86196 Beilstein C i l H12 N2 02 Tryptophan 204.23 0-22-00-00546; 1-22-00-00677; 0-22-00-00550; 4-22-00-06765; 5-22 s t e r e o i s o m e r e s o f unknown c o n f i g u r a t i o n . 27812

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

Η

CH2 CHC02 Η NH

2

Preparation: PRE Educt: casein D e t a i l : ReaJction ueber mehrere S t u f e n R e f e r e n c e (s) : 1. O r g a n i c S y n t h e s e s , 10 , S. 100 2. Dakin, Biochem.J. 12 202, CODEN: BIJOAK 3. F. Hoppe-Seyler, H. T h i e r f e l d e r , P h y s i o l o g i s c h - und p a t h o l o g i s c h - c h e m i s c h e A n a l y s e , 9. A u f l . < B e r l i n 1924>, S. 313 4. Abderhalden, Chem.Ber. 42 , 2333, CODEN: CHBEAM 5. Abderhalden, Kempe, Hoppe-Seyler's Ζ.Physiol.Chem. 52 , 208, CODEN: HSZPAZ 6. Hopkins, C o l e , J . P h y s i o l o g y 27 , 420 J . P h y s i o l o g y , 29 , 453 Note(s) : 7. Handbook Data 8. 1 - t r y p t o p h a n /

Figure 3. Display of dataforTryptophane.

Substance Identification

BRN, MF. CN, STR, SO. ...

Factual Data

FA. PH. ...

Instances of Data

PRE. REA, MP, RI. ...

Bibliographic Data

AU, PY. ISN. JT. PN. ...

l:n

l:n

l:n

Figure 4. Document structure and information levels.

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

THE BEILSTEIN ONLINE DATABASE Moment a t s p e c i f i c p a r a m e t e r v a l u e s . The m a i n q u a l i f i e r (DM) i s a l s o u s e d t o d i s p l a y t h e d a t a f o r D i p o l e Moment. I n a d d i t i o n t o t h e custom formats c o n s i s t i n g o n l y o f a s i n g l e e n t i t y , t h e r e i s a h i e r a r c h y o f combined ( p r e d e f i n e d ) d i s p l a y f o r m a t s . These e n a b l e t h e u s e r t o d i s p l a y a l m o s t any amount o f d a t a u s i n g a s i n g l e d i s p l a y format. There i s a l s o a dynamic d i s p l a y f e a t u r e w h i c h shows t h e c u s t o m e r t h e d a t a r e l a t e d t o h i s / h e r s e a r c h q u e r y p l u s t h e s u b s t a n c e i d e n t i f i c a t i o n i n f o r m a t i o n (QRD = Q u e r y - r e l a t e d D a t a ) . To d i s p l a y t h e c o n t e n t of a B e i l s t e i n d o c u m e n t , t h e F i e l d A v a i l a b i l i t y (FA) i s u s e d (see F i g u r e 2 ) . In the B e i l s t e i n database, the information i s g i v e n in different forms, as textual data (free text), k e y w o r d s , n u m e r i c v a l u e s ( r a n g e s ) , and s t r u c t u r e s . Free t e x t a n d k e y w o r d s c a n b e s e a r c h e d i n t h e same way a s i n b i b l i o g r a p h i c d a t a b a s e s . Numeric v a l u e s are searched as ranges (see chapter 8) using the numeric relation o p e r a t o r s , e . g . · < · ( l e s s t h a n ) o r •>• ( g r e a t e r t h a n ) . C h e m i c a l s t r u c t u r e s c a n be s e a r c h e d as e x a c t structures o r as s u b s t r u c t u r e s . They c a n be b u i l d e i t h e r offline using a g r a p h i c a l structure e d i t o r or o n l i n e using the Messenger STRUCTURE command. Examples for online searches are presented i n the next s e c t i o n s . Chemical Substance

Identification

A l l substances i n the B e i l s t e i n database are i d e n t i f i e d b y a s e q u e n t i a l B e i l s t e i n R e g i s t r y Number ( B R N ) . In a d d i t i o n , t h e r e i s a C h e m i c a l Name (CN) a n d Synonyms ( S Y ) , a M o l e c u l a r F o r m u l a (MF) a n d r e l a t e d f o r m u l a s , a F o r m u l a W e i g h t (FW), and a S t r u c t u r e ( S T R ) . A l l t h e s e f i e l d s a r e b o t h s e a r c h a b l e and d i s p l a y a b l e . F u r t h e r m o r e , t h e r e a r e many a d d i t i o n a l s e a r c h f i e l d s g e n e r a t e d f r o m t h e s e i n p u t f i e l d s . C h e m i c a l Names a r e g i v e n i n I U P A C o r i e n t e d nomenclature. They a r e i n d e x e d as complete names i n CN a n d a s p a r s e d segments i n t h e f i e l d s CNS ( C h e m i c a l Name Segments) and B I ( B a s i c I n d e x ) . The s e g m e n t s a r e g e n e r a t e d u s i n g two d i f f e r e n t a l g o r i t h m s : 1. b y p a r s i n g t h e names a t a l l s p e c i a l c h a r a c t e r s like h y p h e n and comma, and 2 . b y a p p l y i n g a d i c t i o n a r y o f n a t u r a l s e g m e n t s d e v e l o p e d by t h e B e i l s t e i n Institute. C h e m i c a l name s e g m e n t s c a n be s e a r c h e d u s i n g t h e w e l l known p r o x i m i t y o p e r a t o r s (S) , (W) , a n d (A) . A n e x a m p l e f o r s u c h a s e a r c h i s g i v e n i n F i g u r e 6 . H e r e , we a r e searching for derivatives of s a l i c y 1 aldehyde. The a n s w e r s e t c o m p r i s e s 59 h i t s and t h e 4 t h a n s w e r i s a l s o shown i n F i g u r e 6 . A h i t r e s u l t i n g f r o m a s e a r c h i n t h e B a s i c I n d e x may a l s o s t e m from a s t a r t i n g m a t e r i a l o f a r e a c t i o n o r f r o m a b y - p r o d u c t . T h e s e names a r e also

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

3. BARTH

STN Implementation ofDatabases

Field Name

Field Qualifier

Dipole Moment

DM

Temperature Method Solvent

DM.T DM.MET DM.SOL

(1)

F i g u r e 5. Design o f P h y s i c a l E n t i t i e s :

31

Unit D CELCC)

D i p o l e Moment

-> s e a r c h s a l i c y l (w) aldehyde/ens 147 SALICYL 2814 ALDEHYDE/CNS LI 59 SALICYL (W) ALDEHYDE/CNS -> d i s p l a y 4

LI BRN MF CN

FW SO LN

ANSWER 4 OF 59 351066 Beilstein C25 H26 N2 0 salicylaldehyde- Salicylaldehyd- 370.49 2-20-00-00179 24291; 14535; 8629

F i g u r e 6 . E x a m p l e S e a r c h U s i n g C h e m i c a l Name Segments

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

32

THE BEILSTEIN ONLINE DATABASE

indexed i n the B a s i c Index, but they are not n e c e s s a r i l y r e g i s t e r e d B e i l s t e i n compounds. The M o l e c u l a r F o r m u l a (MF) a n d t h e associated search f i e l d s are another p o s s i b i l i t y to i d e n t i f y a c h e m i c a l s u b s t a n c e , λ number o f a d d i t i o n a l s e a r c h t e r m s i s g e n e r a t e d from t h e m o l e c u l a r f o r m u l a ( s e e F i g u r e 7 ) . A t f i r s t , t h e M o l e c u l a r F o r m u l a i s i n d e x e d i n MF a n d B I ( B a s i c I n d e x ) . I n a d d i t i o n , t h e S i n g l e Atom C o u n t s a r e g e n e r a t e d f o r a l l c h e m i c a l e l e m e n t s p l u s some p s e u d o atom c o u n t s l i k e X ( H a l o g e n Atoms) and M ( M e t a l a t o m s ) . F o r each element t h e c o r r e s p o n d i n g p e r i o d i c group and element group i s c r e a t e d . Furthermore, a t o t a l Element C o u n t (ELC) , a t o t a l Atom C o u n t (ATC) , a n d t h e E l e m e n t Symbols (ELS) a r e indexed. The latter fields are e s p e c i a l l y u s e f u l to l i m i t the search t o c e r t a i n ranges o f atoms o r e l e m e n t s . C h e m i c a l s t r u c t u r e s a r e t h e most i m p o r t a n t k e y t o i d e n t i f y s u b s t a n c e s i n t h e B e i l s t e i n d a t a b a s e . The u s e r i n t e r a c t i o n and t h e s u b s t r u c t u r e s e a r c h c a p a b i l i t i e s a r e i d e n t i c a l t o t h o s e o f t h e CAS R e g i s t r y d a t a b a s e a n d a r e described in detail elsewhere (j>) . I n F i g u r e 8, a s u b s t r u c t u r e s e a r c h f o r d e r i v a t i v e s o f adenine i s shown. U s i n g P C - b a s e d s o f t w a r e , l i k e STN E x p r e s s o r M o l k i c k , i t i s a l s o p o s s i b l e t o b u i l d s t r u c t u r e s o f f l i n e and u p l o a d the connection t a b l e s i n t o the database. I t i s a l s o p o s s i b l e t o u s e t h e Lawson Number (LN) f o r s e a r c h e s o f s u b s t a n c e i n B e i l s t e i n . T h i s number i s a f r a g m e n t c o d e b a s e d on t h e B e i l s t e i n s y s t e m . U s i n g t h i s number t h e c u s t o m e r may p e r f o r m s i m i l a r i t y s e a r c h e s o r b r o w s e t h r o u g h a s e t o f s u b s t a n c e s . The p o s s i b i l i t i e s t o u s e t h e L a w s o n Number a r e d e s c r i b e d i n d e t a i l i n C h a p t e r 10 o f t h i s b o o k . Chemical Reaction

Information

Although the Beilstein database is not a typical r e a c t i o n d a t a b a s e , t h e r e i s a l a r g e amount o f c h e m i c a l reaction information available. It i s possible to find d a t a on s u b s t a n c e p r e p a r a t i o n , c h e m i c a l b e h a v i o u r and isolation from natural products (biosynthesis). Currently, all s e a r c h e s must be p e r f o r m e d as text s e a r c h e s f o r c h e m i c a l name s e g m e n t s o r a s s e a r c h e s f o r the a v a i l a b i l i t y of data. This i s c e r t a i n l y a l i m i t a t i o n f o r r e a c t i o n s e a r c h e s . I n many c a s e s , one r e t r i e v e s t h e complete r e a c t i o n i n f o r m a t i o n , i n c l u d i n g the literature r e f e r e n c e s . T h e r e a r e some c a s e s , h o w e v e r , w h i c h c o n s i s t only of a reference. U s i n g t h e F i e l d A v a i l a b i l i t y ( F A ) , one c o u l d s e a r c h for the a v a i l a b i l i t y of r e a c t i o n data. In the f o l l o w i n g e x a m p l e we a r e i n t e r e s t e d i n t h e b i o s y n t h e s i s o f f l a v o n e

Heller; The Beilstein Online Database ACS Symposium Series; American Chemical Society: Washington, DC, 1990.

BARTH

577V Implementation ofDatabases

il

M

Downloaded by UNIV OF SYDNEY on April 15, 2018 | https://pubs.acs.org Publication Date: August 17, 1990 | doi: 10.1021/bk-1990-0436.ch003

CM H H *

CO S3

W

υ

0)

Il II II II II Ο

ιΗ Ο

ϋ w υ

Λ -M

υ

s55Ο­

* H υ Il II II

νο

SM uf< αM

iH

m

CD Eh

X 0) Ό fi H

G