J. FITTING, H.LEHNA,AND K. SPECHT
Plausibility Check of Chemical Coding by Computer Using a Modified Derwent Robins Program J. FITTING, H. LEHNA, and K. SPECHT' Research Laboratories, Schering AG, Berlin/Bergkamen, Germany Received December 27. 1973
The checking of chemical coding (Ringcode) by computer Is described. The check is based on the normal search procedure for Ringcode coded files. Therefore the different logical conditions can be changed easily. Adoption to every chemical Ringcode is possible.
The plausibility check gives us the automatic possibility of finding errors in chemical coding. The coders get alert information about type and frequency of their own mistakes. This leads to optimal coding work. Time and.costs consumed in manual checks of the coding quality are reduced to spot checks. The main reasons for incorrect files are 1. Wrong interpretation of publications to be coded 2. Punching errors 3. Coding errors The wrong interpretation of publications cannot be checked by computer, but, in this area, spot checks are sufficient to prove the standard of a coder. Punching errors which occur nonsystematically can be found to a certain extent, but with an obligatory double check of the keypunching this type of error will be very rare. Pure coding errors can be discovered by computer because of the logical connection of the different fragments. This detection will of course not be total, but a considerable number of mistakes will be found. At first, the different logical correlations between the fragments of the chemical codes are determined. There are two possibilities to proceed: 1. to write a program which considers these correlations to a certain extent 2. to use a retrieval program and to define the correlations similar to the procedure for computer search runs We decided on the second possibility because to us this solution seemed to be more flexible and has the option to be transferable to every similar chemical fragmentation code used now and in the future. We combined a certain sum of check queries. After having them tested and evaluated we combined 31 Check Queries for the General Chemical Code 15 Check Queries for the Steroid Code for the plausibility checks. To illustrate the plausibility check method, an example is given: Figure 1 shows the encoding of a chemical compound with the General Chemical Code. The coding is wrong; the position 7/11 is missing. Using the plausibility check for the General Chemical Code, an error message will be given by the computer: SEVERAL X IN ONE RING Simultaneously a card with the wrong coding will be punched; the corresponding file on the tape is obliterated. The logical correlations for this mentioned error are as follows. The positions 812-817 define the positions of het* Author to whom correspondence should be addressed
76
Journal of Chemical Documentation, Vol. 14, No. 2, 1974
eroatoms in rings with more than one heteroatom. Additionally the positions 7/11 or 714 must be punched. Therefore, we use the following check formula (812 v ai3 v ai4 v ai5 v 816 v 8/71 A (7111v 7/41 "
I
A
-
or if A, then B ( A
-
B B1
On the other hand, if the fragments 7/11 and 7/4 are present, there must be statements in the positions 8/2-8/7, too. Therefore, the logical condition if B, then A (B
+
A1
is correct, too. The condition is alternate. If we use the following definition of a query ( A A E)
V
(A
A
B)
we shall get no error message if both, A and B, are present, and if both are absent. Considering the fact that very frequently coordinate coding terms are both missing, we found it necessary not to check only A and B alternately but to check additionally one or both by further check positions: Condensed heterocycles, for example, have the punch positions (311 V 4/d V 6/11)
c and always need statements in 814-817 (A'). A' has two more possible positions than A. The new logical condition is if C, then A' (C
-
A')
Because A' = ( A
V
816
V
8/11 = > A
it is not possible to detect quantitatively the combined missing of A and B. But the possibility of overlooking erroneous codings is remarkably limited. Generally we try to set up a very fine mesh when we define the check queries. Figure 2 shows all the punch positions which may be checked using our question set with regard to the coding of the substance shown in Figure 1. Some changes of the Derwent Robins Program had to be made. For example, it was necessary to check every card of a file. To identify the erroneous cards, the card's sequence numbers have been included in the address. Files which do not have numbered cards will be numbered by the pro-
CHECK OF CHEMICAL CODING BY A
MODIFIEDDERWENTROBINSPROGRAM
P
Klart e x t :
316
116
216
ANORGAN IUWANI
llGlNl(H
516
616
716
616
719
819
618
E7 619 I
17/12
18112
1
19112
-
21112
PO1Y
SOL7
20111
iq-iqx AN F
GRUPPt AN F
2010
(4;OH
R-NH;
I
20112
jj?Jq-E AN f
1
I
Am3 RIIRO
I
HfllRO (KLUI
p. 28
21111
2211 1
H
GRUPPi AN F
R
2011
2111
C'
2,%14010. AKIlVt -2610 On, O M 01 SH, IM IS
R-NHF
2012
2112
>P-
OH
'p,
R-NRT
1915 R-NR?
1916
R-N(CH;
2,014,
/Si, 2015 \
92016 GWPt I RING
2017
y=o 1918
N+ rrcllstn 1719
I
1819
1919
HOYOLDGI
AllG
OZOHID 4110
NHR
2113
2213
2613
OR
KtIIt
NR2
2114
2214
SH, SI 2115 OM, SM 01,
sr
2116
NTz 2117
-0F-I
-OI--S'
OH, OM
-
g
2216
ANDIRt HtTtRO
2218 IDllSllGf
2019
2119
22/9
0-C
2713
3013
2714 si
4 3015 Andere
2715
P 5
Hexosan
R+ x12617
N-0
-0-s
2
c _
2217
-
Mg
Hg -
2616
-
2118
7
Sp 17 ( :y ( *-Y
or
Y=S
HOMOLO(
2614
0X.NX.I
2215 OR,
LI
2712
-
NTZ
2018
Y=NT
No
I
I
1914
2711
"2
2612
I
2013 'y
2611
1 --
1912
30112 Ribose
2711 1
2110
:S=Y
Sp. 29
\I
I
k l
IPURtN
YOMPLEX
ADO VtRB
2719
-
MI1A L l O R l
HOMOLOCI
ANOlRl
31
32
33
62
65
66
n =MISTAKE Figure 1.
Journal of Chemical Documentation, Vol. 14, No. 2,1974
77
-* J. FITTING,H. LEHNA,AND K. SPECHT
Klart e x t :
11/12
IC OM Y LDUL SRS. E I
c
fOLY
12/12
1311P
14/12
15/12
16/12
POLY
11111
2
Ill0
1210 I
3
-fH
I
1111
D
1211,
4
1112
114 KUNITIT U A POLYM
1 1 21f,,
6b
913
3N
,
1013
-(I
7
1212 (H3X
3/10 1113
1213
'1/16 -(H,N 'KtTTti
-1114
3/43
IIOlItRUNG
1.2
3
12/4
217 7
115
215
915
315
1.3
1015
1115
1
1.34
I
I
1116 ANORGAN IUBITANZ
KiARTlXT
tlGtNICU
DARII
zs
2
317
1117
1217
3
$tX
PRY1
9/8
11018
1118
4/6 1119
17112 POLY
I I 18112 POLY
19112
20112
21/12
-
22112
20111
21111
-
g
22111
GRUPPt
2110
,S=Y 2011
-N = N-
W O M , OR I H , IM. I R
2211
2611
=NPN
NHz
2013
2113
2213
2613
>p:y
OR
KtTTt
N Rz
2015 \
,B-
2016 GRUPPt I RING
2215
OM, IM
01,ST
2116
NTz
2614 1X.NX.I)
OH, OM OR, 01
Anmt
R'X-
HtItRO
2617
2217 N-0
2118 -0-Si
2218
2618
IONIliGf
KOMPltX ADD VERB
2119
2219
HOMOlOGt
iOMOLOG1
R' R-
2619
-
-
MULLLOR[ RADIKPL
4
2716
5 6
2717 HAL
7
2718
8
2719 ANOlRl
31
@ =CHECKABLE
Figure 2.
Journal of Chemical Documentation, Vol. 14, No. 2 , 1974
SI
-
--
-
-
78
2714
P S
2616
2018
2019 y = NT
Mannose
-
2216
2117 -os--ss.
Y=S
p-
2715
2017 y=o
-05--IS-
Galactose
--
NTZ
-
/Joi? Glucose
-
2214
2115
Yonodssoxipintom
NHR
-
SH,SR
ANOtRt
2612
2212 + -NsN
2114
24esoxyribose
-
OH
2f14, ,Si,
2112
2610
2210
>P-
2012
-NR]
2111
Ribose
H
AN f -
2010
3011 2
26112
POL(
POLY
E7 -
32
33
62
65
66
DRS-A
USER ORIENTEDINFORMATION RETRIEVALSYSTEM
Coded compound
Punch positions
output
erroneously absent
OH
a-CH-CHz-0
6/12
6/6
100825
01
0 2 9 ISOLATED R I N G
0 1 4 UNSATURATED HETEROCYCLE
\
do
0
10437s
2/1 25/11
25/0
01
035
SIDE CHAIK
0 4 4 COL.
25/7
104375
23/7
02
25
003 Y I N COL. 2 3
Figure 3.
gram. A new listprogram was written. This program allows the printing of accession number and card number of erroneous cards together with all mistakes (see Figure 3). Contrary to a fixed check program, the procedure de-
DRS-A
scribed here gives the possibility of changing the logical conditions very easily. Thus it is possible to consider the changing of coding rules a.s.0. without changing the program.
User Oriented Information Retrieval System GEZA SZONYI
Polaroid Corporation, Cambridge, Massachusetts 02139 Received December 6, 1973 DRS (Data Retrieval System), a user-oriented computerized information retrieval system, is described. The system allows data base generation and information retrieval by users without programming and systems knowledge. As an illustration, a chemical information system using the Wlswesser Line Notation (WLN) is generated, and retrieval examples are given using this data base. Because of its easily understandable English command language, DRS is suitable for a wide variety of information retrieval systems. Polaroid was instrumental in many of the advanced features of DRS developed during the past three years.
FAMULUS, a user-oriented computerized information retrieval system was presented in the fall of 1969.l This system, consisting of eight main and three peripheral computer programs, allows the user to set up data bases with relative ease. Having done-this, data manipulation, such as addition, correction, and deletion of information units or records, can be carried out relatively simply. Subsequently, searching of the data base by Boolean logic or by keywords is accomplished by English commands, and the retrieved subset can then be sorted alphabetically. In addition, data subunits or files can be merged, and printouts of the retrieved subset can be displayed in several output formats.* The system was mainly designed to handle personal reference collections but appeared to have applicability for all types of computerized information retrieval systems3 Unfortunately, the system was designed for IBM System/S60
Control Data Corp. 6400 and 6600, UNIVAC 1108, or Xerox Sigma 7 type computers.',* These are large machines and not particularly suited for operation by individual users without system and programming knowledge. Also, searches with this system are carried out in the batch mode, rather than interactively, making the information retrieval rather awkward. A more serious drawback of FAMULUS is its magnetic tape orientation; i.e., searching is done sequentially rather than a t random. In addition, typical data bases contain 3000-5000 records, each record having about 4000 characters.* We were interested in FAMULUS-type retrieval systems, i . e . , having all the attributes of such a system, but one which could be implemented on minicomputers, since these could potentially be operated by single users without system or programming staff support. Further, data bases Journal of Chemical Documentation, Vol. 1 4 , No. 2, 1974
79