Molecular structure input program using a storage ... - ACS Publications

Molecular structure input program using a storage cathode ray tube terminal ... Journal of Chemical Information and Computer Sciences 1992 32 (4), 279...
0 downloads 0 Views 391KB Size
Molecular Structure Input Program Using a Storage Cathode Ray Tube Terminal William E. Brugger and Peter C. Jurs Department of Chemistry, The Pennsylvania State University, University Park, Pa. 16802

The enormous amount of information now being generated can best be handled by a computer programmed for this purpose. During the past decade, computer systems have been developed for information storage and retrieval. However, the preparation of information for computer handling is slow and expensive, especially when chemical structures are involved. The transformation of molecular structural diagrams into compatible forms for digital computer handling still remains a major difficulty for the small research facility and the occasional user. Though several methods are presently available, they involve either lengthy learning procedures, expensive hardware, or long time intervals to prepare the input. For the occasional user, all of the above limitations must be minimized in order to be of practical use. Linear notation methods such as Wiswesser, IUPAC, and Hayward are often used for chemical information retrieval systems; several variations of connection tables are also used. Neither method of representation requires more than a keypunch to prepare data cards for a molecular structure; however, linear notation methods require the learning of symbols and ciphers in order to encode structural diagrams, and connection tables are tedious to encode and lead to a high probability of error ( I , 2). Chemical typewriters make the input of chemical structures convenient, and not only can the structure be entered, but also atom coordinates. However, chemical typewriters require special training to operate, and they are economically feasible only for large operations which are continually coding structures (3-5). In the past 10 years, the use of interactive computer graphics for the input of structural diagrams has replaced other means of input. Devices such as RAND Tablets and light-pens coupled with one or several Cathode Ray Tubes (CRT) allow the user to input the structure in a convenient form including coordinates. Real time operations such as rotations or scaling can be performed on the molecular structure diagram as well. Graphical input devices give the user the capability to visually check for errors and correct them. They are especially desirable because the natural language of chemists consists of graphical representations of structures. However, the hardware cost is often prohibitive for the small facility (6-9). Recently a system using a computer-controlled television camera was reported (10); however, this is probably not the answer for the small facility or occasional user. The development of low cost storage CRT graphic display terminals makes it possible for small facilities to have an interactive graphics unit which can often replace a teletype as an input device. Though software does exist for standard graphics operations (e.g., point plotting, drawing lines, screen scaling) for these storage CRT units, the input of molecular structural diagrams necessitates a special program. Such an input program has been developed and a description follows.

ROUTINE UDRAW Routine UDRAW has been developed using a Tektronix 4010-1 computer-controlled display terminal and the Tektronix “PLOT-10’’ software designed to do standard graphics operations. The display terminal is equipped with an

input cursor which allows the operator to indicate a point on the display screen. I t is this cursor which allows the user to interact with the program and construct chemical structure diagrams. Thumb-wheels located next to the keyboard control the location of the cursor. The display terminal is controlled by a 16-bit word MODCOMP I1 digital computer having 48K words of memory. However, the program could be run, with minor modifications to the routines which act as interface between uDRAW and the display terminal, on any dedicated or time-sharing system which can communicate with the display terminal and can execute FORTRAN programs. UDRAW is written in standard FORTRAN and is therefore independent of word size, and requires about 4.2K words of memory in its present form. The supporting PLOT-10 software routines required by UDRAW occupy an additional 2K words of memory. A general flow diagram for routine UDRAW is given in Figure 1. The first section initializes storage arrays, and displays the directive menu. Once the first molecular diagram has been entered, the initialization section is bypassed and the previous entry is redrawn. Thus, the operator has the choice of either modifying the structure or initializing the arrays and entering a new molecule. This feature allows for fast encoding of data sets containing similar molecules. The procedure for encoding a structure such as acrylic acid is simple and straightforward. As the routine enters the connection table section of UDRAW, the cursor appears. The operator then moves the cursor by means of .thumbwheels to the desired position of the first atom. The space bar is then depressed to indicate a carbon atom and this is followed by a RETURN. The cursor reappears ready to accept the location of the second atom. After moving the cur-

@

SCREEN AND

2 7

I

ENTER STRUCTURE

1

1

7

1

ENTER lNFORYITION

I

11

CILCULATE STEREOWEN

1

~

ALTER STRUCTURE

I I

J

+ STRUCTURE

& Figure 1. Flow chart of routine UDRAW A N A L Y T I C A L CHEMISTRY, VOL. 4 7 , NO. 4 , APRIL 1975

781

Table I. Codes Used by UDRAW Recognizable atom types

Numeric code

Keyboard character to indicate atom type

Carbon Oxygen Nitrogen

1

Space

2 3

Sulfur

4

Fluorine Chlorine Bromine Iodine Phosphorus

5 6 7 8 9

0 N S F C B I P

Bond type

Numeric code

Single Double Triple Aromatic

Keyboard character

1 2

2

1

3 4

3 4

sor to the desired location of the second atom, the user enters “Space”-RETURN to indicate another carbon atom. This is followed by depressing ‘%‘”-RETURN, thus generating a carbon-to-carbon double bond. Since the atom last entered is a member of the next bond, the cursor is left in the previous position and “Space”-RETURN indicates this atom as the first atom to which the second bond will be connected. The cursor is then moved to the desired position of the third carbon atom of the molecule; “Space”RETURN followed by “1”-RETURN generates a single bond between carbons two and three. The double bond to the oxygen is generated by first pressing “Space”-RETURN while the cursor is on atom number three. Then, after moving the cursor, ‘‘0”-RETURNis depressed to indicate an oxygen atom and “2”-RETURN designates a double bond. Since the final atom is also connected to atom number three, the cursor is first positioned over atom three and “Space”-RETURN is entered. This indicates the atom as the first of the final pair. The cursor is then positioned for the final atom and “ 0 ” - R E T U R N and “1”-RETURN generates the carbon to oxygen single bond. Since the structure is completed, a “D”-RETURNis entered which causes the routine to branch to the directive input section of UDRAW. In this case the “FINISH” command would then be given. The display on the screen at the end of the input procedure would appear as follows if the molecules were entered from left to right:

c=c-c

H0 ‘ 0

As a molecular structure is entered, a chemical connection table is generated which contains the identity of each atom, the connectivity of each atom, and the bond type of each connection. Two-dimensional screen coordinates for each atom are also stored and are converted into Angstrom coordinates in the final section of UDRAW. Atom and bond types which are recognized by UDRAW are listed in Table I along with their respective numeric codes which are used in the connection tables and the keyboard character used to enter the atom or bond type. Modification of this list requires the changing of only four parameter statements. The program is thus adaptable to a wide variety of molecular diagrams. In addition to the connection table generation section, a ring information section and double bond stereochemistry seetion have been included. The information from these program modules may not be necessary for chemical data retrieval systems but are used in molecular modeling and substructure searching programs. 782

ANALYTICAL CHEMISTRY, VOL. 47, NO. 4 , APRIL 1975

As the operator points out ring atoms with the cursor, the routine enters them into ring information storage arrays. The atom number of each ring atom, and the ring size that each ring atom is a member of, are stored for later use. If the only ring system present consists of benzene rings, the algorithm will automatically calculate the ring information; therefore, no operator intervention is necessary in this case. Naturally, this section could be replaced with a ringfinding algorithm, if desired. The section which calculates the stereochemistry about double bonds is completely operator-independent, once the structure has been entered. I t consists of a search of all bonds to locate double bonds, a determination to make sure that the double bond is not a terminal bond, and, finally, a calculation to determine the stereochemistry about interior double bonds. To calculate the stereochemistry, the screen coordinates of the atoms are used. Thus, the operator must draw the correct representation of the molecule on the screen. The calculation involved is the taking of the dot product of the single bonds attached to each atom forming the double bond. If 2-pentene were entered in the following manner:

The bond going from C2 to C1 could be represented as a vector, A, and the bond from C3 to C4 as a vector, B. The definition of the dot product of two vectors is A * B = IyIq cos0, where 0 is the angle formed by the two vectors. Since cos0 is negative for 0 greater than 90°, the sign of the scalar quantity, A * B A Y IB I, indicates if the bond is cis or trans. In this case, since cos0 is greater than zero, the bond is cis. Once a molecular structure has been entered, changes can be made using the alteration section of UDRAW. The operator can change any atom identity or any bond type. If desired, an atom can be entirely deleted from the structure. If the ring information was entered incorrectly, the errors are easily rectified. Also, double bonds drawn in one stereochemical configuration can be changed to the other type by simply indicating the bond to be changed. In order to check to make sure the alterations have been entered correctly, the entire molecular structure may be redrawn at any time. Upon the input of the “FINISH” directive by the user, the routine first checks flag variables which indicate if the ring and stereochemistry sections have been entered by the user. If the flags indicate a negative answer, the section is entered and the necessary calculations are performed. After this, a set of coordinates is calculated for each atom from the screen coordinates. The conversion factor is obtained by dividing the carbon-to-carbon single bond length in Angstroms, 1.54, by the average bond length of the same type in screen coordinates. Thus, a new set of two dimensional coordinates for the x-y plane is obtained; to add the third dimension, the 2 = 5.0 plane is used for all atoms. This conversion allows more precise scaling as compared to assigning one factor for all conversions. Throughout the routine, checks are made to ensure plausible information is being entered. Whenever an error is encountered, the display terminal bell is rung and the routine branches back to the point where the error was made. Also, as the data are accepted, information is echoed back to the terminal in the form of either symbols, bonds, or as numbers indicating ring information. In this way, constant visual feedback is presented to the user and incorrect data are not accepted.

RESULTS AND CONCLUSIONS In a practical application of UDRAW, a data set of 64 ring containing compounds was encoded. The average number of atoms per molecule in this data set was 18. The average time to encode each molecule was slightly less than 3 minutes which includes all the operations such as compound labeling, screen set up, error corrections, and structure redrawing. To prepare the same data for card input would require far more time, In the particular case of morphine, four minutes or more can be saved by using UDRAW rather than card input. Naturally, as the number of atoms per molecule decreases, the time saved by using UDRAW also decreases. However, the convenience of using conventional molecular structural diagrams and also ease of error correction are always present. Overall, the routine allows the small facility and occasional user a convenient input method for encoding structural diagrams in that: 1) Hardware requirements are small and relatively inexpensive. 2) Conventional chemical structural diagrams are used rather than special codes or symbols. 3) Scaled three-dimensional coordinates are automatically generated for each atom. 4) Errors are easily detected by visually checking the input diagram against the original. 5 ) Errors are easily corrected. 6) The mechanics of using the routine are easy and quick to learn. However, inherent in the use of this graphics input system is the sacrifice of screen display size and unfamiliar use of thumb-wheel controls.

This routine is presently being used as an input subroutine for a molecular modeling program and also to enter large data sets of organic molecules into files. The routine is not used in information retrieval, only because the need has not presented itself. However, there is no reason why UDRAW cannot be used for this purpose. The addition of a section to handle the stereochemistry about an asymmetric carbon and also a ring-finding algorithm are presently being considered and will be added in the near future. Routine LIDRAW will gladly be given to interested readers upon request.

LITERATURE CITED (1) M. F. Lynch, J. M. Harrison, W. G. Town, and J. E. Ash, "Computer Handling of Chemical Structural Information," Macdonald, London and American Elsevier, New York, N.Y., 1971 (2) W. T. Wipke, S. R. Heller, R . J. Feldmann, and E. Hyde. "Computer Representation and Manipulation of Chemical Information," Wiley-lnterscience, New York, N.Y., 1974. (3) J. M. Muller, J . Chem. Doc., 7, 88 (1967). (4) Ronald Cottardi, J. Chem. Doc., I O , 75 (1970). (5) A . Feldman, J. Chem. Doc., 13, 53 (1973). (6) I. E. Sutherland, Sci. Amer., 215 ( 3 ) ,86 (1966). (7) E. J. Corey and W. T. Wipke, Science, 166, 178 (1969). (8) C. D. Farrell, A. R . Chauvenet. and D. A. Koniver, J. Chem. Doc., 11, 52 (1971). (9) R. J. Feldmann. S. R . Heller, and K. P. Shapiro, J . Chem. Doc.,12. 41 (1972). (10) W. S. Woodward and T. L. Isenhour, Anal. Chem., 46, 422 (1974)

RECEIVEDfor review October 9, 1974. Accepted December 6, 1974. The financial support of the National Science Foundation is gratefully acknowledged.

Inexpensive Septumless Injector for High Pressure Liquid Chromatography D. A. Usher, A. H. McHale, and David Yee Department of Chemistry, Cornell University, lthaca, NY 14853

In high pressure liquid chromatography, it is common practice to introduce the sample by syringe injection through a septum, which is often made of silicone rubber. This system can suffer from several disadvantages: the needle may become blocked by septum material or, if injection is made into a high pressure, a part of the sample may be blown back past the syringe plunger seal. There is also the

possibility that the pierced septum may leak under the high pressures often used in modern liquid chromatography and, to safeguard against this, frequent replacement of the septum may be necessary. In all of these cases, accurate quantitation would be impossible. In order to circumvent this problem, a number of manufacturers have made available septumless injection systems of various designs. These

Figure 1. The septumless injection port shown dissassembled (left) and assembled (right); the unit is used in the horizontal position as shown A N A L Y T I C A L C H E M I S T R Y , VOL

47, NO 4

APRIL 1975

783