Chapter 3
Chemical Graphics: Bringing Chemists into the Picture Trisha M. Johns
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
G. D. Searle & Co., Skokie, IL 60077
It is interesting to note that the literature of chemical documentation through the past twenty years has had a few central themes: one is the need for systems to denote chemical structures in computer-readable form, and another is the need to educate chemists in literature retrieval. Until recently, one theme has precluded the other: as long as the standard computer systems were limited to non-graphical representations, chemical information retrieval would remain sufficiently mysterious to the end user chemist. As structure graphics systems have only recently become technologically and economically feasible, end user chemists are now beginning to become computer users. The following discussion is a case study of the "chemists' computer revolution" at G. D. Searle.
Economics and p r a c t i c a l i t y have been the driving force behind the development of chemical documentation from i t s early days. The "hieroglyphic"-type symbols i n use i n the mid-18th century could not be accommodated by the p r i n t i n g press, which led Berzelius to suggest i n 1814 a nomenclature based on l e t t e r s and numbers (see Figure 1), where each element would be represented by the f i r s t l e t t e r of i t s L a t i n name, and the number of atoms i n the molecule be designated by a number to the upper right of the elemental symbol. When Berzelius l a t e r " s i m p l i f i e d " h i s notation by using bars, dots and commas to denote numbers of atoms, he only made i t worse. Liebig (_1) proposed i n 1834 that to f a c i l i t a t e p r i n t i n g , superimposed notations not be used, and that the numbers be written below and to the right of the element symbol (see Figure 2). The f a m i l i a r p i c t o r i a l (valence) notation was f i r s t suggested by Couper, and popularized by chemists of the day l i k e Kekule (2). P r i n t i n g technology kept up with further development of the two- and three-dimensional chemical graphs which not only aided v i s u a l i z a t i o n of the molecules, but actually came to be depended on by chemists to express t h e i r ideas. As the number of known compounds grew, i t became necessary to begin cataloging them. 0097-6156/87/0341-0018$06.00/0 © 1987 American Chemical Society
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
3.
JOHNS
Bringing Chemists into the Picture
19
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
Verbal communication had necessitated the development of chemical nomenclature, and i t s standardization made indexes of the chemical l i t e r a t u r e possible. The e a r l i e s t tracking systems depended on conventional substructure methods with edge-punched cards. The e f f o r t s i n computer technology were based on non-graphic business applications, so i t was most p r a c t i c a l for the chemical information systems to be made to adapt to the standard business computer. Systems evolved that were based on linear representations of graphical formulas, such as Wiswesser Line Notation, WLN (see Figure 3), or standard chemical nomenclature. But the l i n e a r representations were one step further removed from the 2-D structure that the chemist so depended on, and involved learning what i n essence was a foreign language. The chemists found themselves more and more alienated from t h e i r own l i t e r a t u r e , and information intermediaries found their place i n the sun. The Searle Case That economics played a role early on at Searle was mentioned i n a paper my predecessor Dr. Howard Bonnett wrote i n the Journal of Chemical Documentation i n 1962 (3). Work was being done at that time on computer representation of 2-D chemical graphics, but i t required p r o h i b i t i v e l y expensive, single application, equipment. He noted that one of the primary reasons Searle had adopted Wiswesser Line Notation was that the notation could be used on a standard accounting computer and did not require a high c a p i t a l outlay. When i n the early Sixties Searle encoded i t s i n t e r n a l chemical structures into Wiswesser Line Notation, chemists were for the f i r s t time able to quickly discover the presence or absence of a compound in the f i l e , or what sort of analogues had been made of a certain structure, f i r s t through alphabetic computer l i s t i n g s , then l a t e r through permuted indexes and batch searching of the WLN. The information s c i e n t i s t was an i n t e g r a l part of the process, encoding the request, doing the search, and interpreting the WLN results into the 2-D graphics the chemist could appreciate. Through the years, many improvements were made to the o r i g i n a l system, u n t i l the l a t e Seventies, when the "accounting" computers were replaced by s c i e n t i f i c a l l y adaptable, interactive machines, and our batch searching evolved to online r e t r i e v a l using CRT's. The WLN database i t s e l f had not changed s i g n i f i c a n t l y (other than to grow), but i t was augmented by the generation of WLN-fragments, which allowed f o r a sophisticated online searching system (_4). As the system continued to be based on WLN, the information s c i e n t i s t ran the searches, and the output continued to be interpreted for the end user chemist. The WLN-fragment search was notable i n that i t s use did not require f u l l knowledge of the notation and i t s rules. A "reading" knowledge of WLN was a l l that was required, and armed with a WLN-fragment dictionary and a short t r a i n i n g course, a few adventurous chemists a c t u a l l y used the system themselves. I t was not for lack of enthusiasm that the system did not receive f u l l
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
GRAPHICS FOR CHEMICAL STRUCTURES
Ω
Θ
hydrochloric acid
©χ
Scheele, 1772
calcium chloride
potassium sulfite
Lavoisier, 1787
potassium sulfate
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
lead sulphate zinc oxide
O© KSO
*
Dalton, 1835
4
s
potassium sulfate
Berzelius, 1814
potassium sulfate
Berzelius, ca. 1830
à
water
H
hydrogen sulfide
1.
Figure
C4H
10
H H \ / H
Early
chemical
hieroglyphs.
butane
Liebig, 1834
butane
^ Couper &
H I
C
C
^C^ -C- ^H ι /\ H H H
Kekule, 1858 H H H H KL H
Figure
2.
H
H
H
butane
contemporary
Development o f g r a p h i c s - o r i e n t e d
representations.
CH H S3
(E)-1,4,5,6-tetrahydro-1 -methyl-2[2-(3-methyl-2-thienyl)ethenyl] pyrimidine
T6N CN AUTJ C1 B1U1- BT5SJ C1 Figure
3.
Linear
standardized
representations
chemical
of graphical
formula:
n o m e n c l a t u r e and Wiswesser L i n e
Notation,
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
3.
JOHNS
Bringing Chemists into the Picture
21
support from a l l the chemists. What prevented chemists at that time from using the system was the n o n a v a i l a b i l i t y of computer terminals. A big b a r r i e r yet to be overcome was the tendency of some i n upper management to view with suspicion any s c i e n t i s t ' s a c t i v i t y that did not require beakers and flasks, and i t was not u n t i l those basic attitudes changed that the computer revolution for chemists could possibly have happened.
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
The Move to Graphics While end user chemists had never been involved i n our in-house chemical database development, when i t became p r a c t i c a l to look at graphics alternatives, they were an i n t e g r a l part of the decisionmaking. A task force of information s p e c i a l i s t s , systems analysts, and chemists evaluated the course of action: whether to build a graphics system in-house, or to buy e x i s t i n g software. Economic factors continued to p r e v a i l , and i t was seen that in-house development, however much preferred from a customization standpoint, would take too long to complete. The r e s u l t of several months of discussion, demonstrations and s i t e v i s i t s , was the decision to license Molecular Design L t d . s MACCS system, the now well-known molecular connectivity-based r e t r i e v a l system. 1
Besides providing Searle with twenty years of access to and organization of i t s compounds, the WLN turned out to be the c r u c i a l l i n k to the chemical graphics system towards which we had been working. I r o n i c a l l y , i t was at the time that we were replacing the WLN that i t s strength became most evident. Though well-adapted to computer manipulation, WLN was o r i g i n a l l y devised without t h i s i n mind, as a rule-based, l i n e a r i z i n g method of indexing structures. WLN withstood the impassionate l o g i c of the computer, and through another purchased program, DARING, from Fraser Williams ( S c i e n t i f i c Systems) Ltd., our database of 50,000 WLN's was converted automatically to connection tables with an error rate of less than four per cent. Simple conversion of the DARING connection table to MACCS connection table format and manual entry of the errors (about 2,000 compounds a l l told) allowed the entire database to be converted and searchable i n MACCS within seven months. At l a s t , the end user was able to use what has become the standard language of chemistry to access i n t e r n a l chemical information. Information s c i e n t i s t s developed a full-day training program and users guide, and trained 100 Searle chemists to use the new database by the end of 1984, only nine months a f t e r acquisition of MACCS. That there was a change i n attitude by R&D management toward this f i r s t approach to end user chemist searching was due i n no small part to t h e i r investment i n the decision. But success required more than approval from above, more than a graphics database, and more than a user-friendly system. What s t i l l needed to happen was the p r o l i f e r a t i o n of graphics terminals before end user searching could become p r a c t i c a l .
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
22
GRAPHC IS FOR CHEMC IAL STRUCTURES
Hardware
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
Without convenient terminals and hard-copy devices, the s i t u a t i o n for the chemist would be l i t t l e better than the early days of chemical graphics, when even the largest of companies would only invest i n a few expensive workstations and chemists would have to leave t h e i r laboratories to run a search. The r e a l turning point that made the computer revolution happen was that systems were beginning to be hardware-independent and prices were continually decreasing i n the competitive computer market. Simple graphics terminals l i k e the Envision terminals that had been used for database development had cost us around $8,000 each; today there are comparable graphics terminals i n the neighborhood of $2,000. The standard VT-100 terminal just a few years ago cost $2,000 but now can be had for about $600. To bring an entire department into the computer age means a large c a p i t a l investment but the current p r i c i n g structure has allowed this to happen at a much faster rate than could have been envisaged e a r l i e r . The required terminal would have to be inexpensive, with s u f f i c i e n t l y high graphics resolution, would have to be compatible with MACCS, but also be used f o r a variety of applications. Our experience with the Envision terminal showed that i t was not robust or cheap enough for mass d i s t r i b u t i o n . The Apple Macintosh was chosen as the chemist's workstation since i t f i t the basic requirements and had other advantages as w e l l . Choice of the Apple
Macintosh
The small screen of the Macintosh was i n i t i a l l y alarming, but a f t e r a few minutes use, one r e a l i z e s that the e f f e c t i v e resolution i s s u f f i c i e n t l y high to make the images clear and easy on the eyes. The price of the Macintosh when we made our f i r s t purchases was around $1600, less than the VT-100 a few years ago, and well within the range of normal o f f i c e equipment expenditures. Its compact design makes the Macintosh f i t well into cramped laboratory settings, and the fact that i t i s portable aided i t s introduction to the chemists, who were encouraged to take i t home for practice. The short learning curve i s a distinguishing feature of the Macintosh. Pull-down windows lessen the number of commands that have to be remembered, and the use of a mouse rather than the keyboard makes t r a d i t i o n a l typing s k i l l s less c r i t i c a l . To make the Macintosh compatible with MACCS, we use the Versaterm Pro software package, which allows the Macintosh to emulate the Tektronix 4105, one of the acceptable terminal types. By this same package, the Macintosh can emulate the Tektronix 4014 and VT-100, which allow the chemists to use other applications programs, such as SYNLIB for chemical reaction l i t e r a t u r e and of course, a l l the standard non-graphics applications such as electronic mail.
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
3.
JOHNS
23
Bringing Chemists into the Picture
The Macintosh can be hooked up to a high quality l a s e r p r i n t e r (the Apple LaserWriter) v i a the AppleTalk network software, or when used as a terminal, i t can direct output from systems such as MACCS or SYNLIB to a l o c a l l a s e r p r i n t e r . (The Figures i n this Chapter were done on the Macintosh and printed on the LaserWriter.) Another deciding factor i n the choice of the Macintosh was the a v a i l a b i l i t y of quality software.
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
Software Examples Once chemists became f a m i l i a r with the Macintosh and started using the internal compound database, they began to branch out and discover t h e i r own computer applications. Chemical structures produced by MACCS, while fine f o r i n t e r n a l reports or correspondence, are not of publication quality. The ChemDraw software from Cambridge S c i e n t i f i c Computing, Inc., i s now being used extensively for situations demanding high quality structures, such as s l i d e presentations, and also f o r merging structures with text. Using a combination of ChemDraw and MacWrite, for example, the chemist can insert chemical structures into word processing documents. The Apple Switcher u t i l i t y enables the chemist automatically to switch back and forth among several programs to create the desired report. Sample output from this simple process i s shown i n Figure 4. The text was written using MacWrite, with the f u l l Macintosh complement of fonts, s t y l e s , special features l i k e holding, etc. The chemical reaction sequence was drawn using ChemDraw. 1
Some of the chemists demands f o r structures are based on s p e c i a l t i e s , f o r instance, peptides, where the need i s f o r a hybrid notation r i c h i n text but with some s t r u c t u r a l elements. Figure 5, for example, shows the well-known neuropeptide vasopressin, drawn with ChemDraw. For chemical reaction l i t e r a t u r e , chemists use the SYNLIB database from Distributed Chemical Graphics. SYNLIB i s a well-documented, user-friendly system, designed for end user browsing. Figure 6 shows sample SYNLIB printed output, two records to the page. For chemical supplier information, the Fine Chemicals Directory from Fraser Williams ( S c i e n t i f i c Systems) Ltd. i s available through MACCS, as i s an i n t e r n a l l y developed database of chemicals available i n our Chemical Stockroom. Molecular Modeling For those with an interest i n true 3-D structures, we have also a number of user-friendly molecular modeling packages available on our system, among them SYBYL, from Tripos Associates, Inc., and Macromodel, from Prof. Clark S t i l l , Columbia University. Both can be accessed v i a the Versaterm Pro emulation software on the Macintosh, as well as from intermediate and high-performance workstations l i k e the NEC-APC and the Evans and Sutherland PS-300. These s p e c i a l i s t packages are maintained by our Drug Design department, who help out users with the s p e c i f i c s of the software.
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
24
GRAPHC IS FOR CHEMC IAL STRUCTURES
A n abstract in a recent C A Selects drew m y attention to the fact that e n z y m e s c a n b e u s e d to hydrolyse hydantoins to amino a c i d s under mild conditions, a n d in many c a s e s c a n selectively convert D L starting materials to pure D o r L products, often quantitatively:
Ri
R
2
H 0 2
^=0
Ho Ν
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
OH DorL
W h e n o n e of the substituents is h y d r o g e n s o m e of these e n z y m e s function a s both a m i n o h y d r a l a s e s a n d as r a c e m a s e s , leading to 1 0 0 % c o n v e r s i o n of r a c e m i c hydantoin to just o n e a m i n o acid enantiomer; others selectively hydrolyse o n e hydantoin enantiomer leaving the other u n c h a n g e d .
W h e t h e r the pure D o r L
a m i n o a c i d is p r o d u c e d d e p e n d s o n the particular e n z y m e s y s t e m involved. T h e s e c o n d step, c l e a v a g e of the intermediate N - c a r b a m o y l a m i n o a c i d to the free a m i n o a c i d c a n b e enzymatic or c h e m i c a l , but in either c a s e is a c h i e v e d without racemization under relatively mild conditions.
I h a d Information S e r v i c e s d o a C A s e a r c h o n this topic a n d a c o p y of the 21 references found is attached. F r o m the n u m b e r of recent patents o n the subject it would a p p e a r that this h a s b e c o m e a m e t h o d of s o m e industrial importance, e s p e c i a l l y for the production of optically pure unnatural a n d D-amino a c i d s .
T h e actual experimental conditions e m p l o y e d range from the u s e of cultured broths of c o m m o n microorganisms,
through the u s e of cell-free extracts, to the u s e of c o l u m n s of fully immobilized
e n z y m e s e m b e d d e d o n cellulose triacetate fibres, a n d reactions are rapid at 3 0 ° .
F i g u r e 4» Example u s i n g the Apple S w i t c h e r U t i l i t y : i n MacWrite, r e a c t i o n sequence i n ChemDraw.
text
done
Neurohypophysial Hormones Neuropeptides from the Pituitary 1
Vasopressin - V P
2
3
4
5
6
7
8
9
Cys -Tyr -Phe -Gln -Asn -Cys -Pro -Arg -Gly -NH
2
4
$-hypophamine
human bovine chicken horse sheep cat dog camel rat
Antidiuretic Pressor
porcine hippopotamus
ϋ 1
2
3
4
5
6
7
8
9
Cys -Tyr -Phe -Gln -Asn -Cys -Pro -Lys -Gly -NH
F i g u r e 5.
Example
showing the v e r s a t i l i t y
2
o f ChemDraw.
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
3.
JOHNS
Bringing Chemists into the Picture
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
SYNLIB
25
™ V2.2
27-AUG-8G 16:42
R « ET Yt 80S Ct OMET Rt I I LAPKIN. R M KISLOVETS. 20K. 4. 801 (19S8): CA, 69. 19054 (1968)
CYCLOAOOITION Yt 901 Cl PHOT Ri C. HEESE, P. LECHTKEN, AC, 83. 143 (1971)
F i g u r e 6.
Sample o u t p u t from SYNLIB.
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
26
GRAPHC IS FOR CHEMC IAL STRUCTURES
Conclusion Bringing the chemists into the picture by bringing pictures to the chemists' laboratories has brought about a revolutionary change i n the way the chemists do t h e i r work. I t has happened at Searle through a combination of events, i n some cases quite f o r t u i t o u s l y . It depended on a needed 2-D structure database and user-friendly system, on the approval of R&D management for end user searching, on cheap, multi-purpose graphics terminals, on user involvement and t r a i n i n g , and on the chemists c u l t i v a t i n g the habit of using a computer terminal.
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
1
This l a s t point i s very important. People who work with computers day i n and day out are i n the habit of logging on, reading t h e i r mail, and doing t h e i r work. S c i e n t i s t s who use the computer only as an adjunct to their work need opportunity to stay f a m i l i a r with i t , such as regular use of electronic mail, or keeping private f i l e s up-to-date. For such casual users, the retraining curve must not be too steep, or they w i l l be discouraged. Of course, t h i s i s where an online HELP system or pull-down windows can prove invaluable. The part of the information s c i e n t i s t i n the revolution should not be underestimated. For guaranteed, all-around success, someone had to work out a l l the d e t a i l s beforehand, from establishing a quality database, to selecting the hardware and i n i t i a l software, to providing indepth t r a i n i n g . Information s c i e n t i s t s are i n the unique position of having t h e i r heart i n the subject matter, as well as knowing the computer systems. There has been a change i n t h e i r role as well. They are not i n the middle of every i n t e r n a l search request, but the more complex questions continue to be referred to them. Information s c i e n t i s t s continue to do the database maintenance and development, while taking on additional r e s p o n s i b i l i t i e s f o r training and "customer support", and evaluation of new systems and applications. While a quantum leap has been made, there i s s t i l l much to be done. Medicinal chemists want to see structures and data together, and while we automatically can get biology data from results of a substructure search, or get structures printed for compounds with certain biology data parameters, our current system i s not f l e x i b l e enough and the databases are not t r u l y integrated. While the chemists have tools, l i k e ChemDraw, to produce chemical structures to t h e i r s p e c i f i c a t i o n s , they s t i l l have to do i t themselves. Since beauty i s i n the eye of the beholder, there w i l l always be an a r t i s t i c (subjective) dimension to chemical structure drawing that w i l l make complete automation inconceivable. For example, the Merck Index shows the benzomorphan Metazocine as a highly s t y l i z e d depiction of this bridged t r i c y c l i c structure, which represents only one of many opinions as to how i t should be drawn (see Figure 7 ) .
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
3.
JOHNS
27
Bringing Chemists into the Picture
Benzomorphan Metazocine
HO
HO Merck
Figure
7.
Index
representation
Two g r a p h i c a l r e p r e s e n t a t i o n s
of
the same compound.
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.
GRAPHICS FOR CHEMICAL STRUCTURES
28
Another deficiency i s that a single software package may not s a t i s f y a l l requirements. Structures stored i n MACCS may not be drawn i n a preferred way, so a package l i k e ChemDraw i s used. The Switcher program was written because several packages may need to be used to get the desired r e s u l t . The chemists requirements are severe, and we have not met them a l l .
Downloaded by UNIV OF MISSOURI COLUMBIA on April 16, 2018 | https://pubs.acs.org Publication Date: June 15, 1987 | doi: 10.1021/bk-1987-0341.ch003
1
The theme of t h i s Chapter i s bringing computer graphics to the laboratory. Perhaps the natural consequence of this w i l l r e a l l y be to reunite chemists with t h e i r l i t e r a t u r e . While i t may not be economically viable now f o r end user searching of commercial databases, there are a l t e r n a t i v e s , such as o p t i c a l disks, that are being developed for low-cost end user browsing, which i n the future might provide in-house access without the r i s k of cost overruns. By helping chemists become computer l i t e r a t e , we are approaching that goal. Acknowledgments The author would l i k e to thank Michael Clare f o r sharing h i s expertise, for contributing h e l p f u l suggestions, and f o r providing Figures 4 and 5.
Literature Cited 1. 2. 3. 4.
Jorpes, J. E . , Jac. Berzelius, His Life and Work; Alqvist & Wiksell: Stockholm, 1966. Pauling, L . , General Chemistry: An Introduction to Descriptive Chemistry and Modern Chemical Theory; W. H. Freeman: San Francisco and London, 1953. Bonnett, H. T.; Calhoun, D. W., J. Chem. Doc. 1962, 2, 2-6. Johns, T. M.; Clare, M., J . Chem. Inf. Comput. Sci. 1982, 22, 109-113.
RECEIVED March 11, 1986
Warr; Graphics for Chemical Structures ACS Symposium Series; American Chemical Society: Washington, DC, 1987.