Mechanized Searching of Phosphorus Compounds: III. Punched

J. Frome, F. M. Sikora, and M. Gannon. J. Chem. Doc. , 1961, 1 (1), pp 88–90. DOI: 10.1021/c160001a023. Publication Date: January 1961. ACS Legacy A...
0 downloads 0 Views 281KB Size
MECHANIZED SEARCHING OF PHOSPHORUS COMPOUNDS. m. PUNCHED CARDS VS. RANDOM ACCESS COMPUTERS* By J. FROME, F . M. SIKORA, and M. GANNON U. S.

Department of Commerce, Patent Office, Washington, D.

During the past s i x months patent applications f o r organic phosphorus e s t e r compounds have been mechanically searched by the two separate systems previously described. To date about 200 s e a r c h e s on the RAMAC and 102 s e a r c h e s on the multicolumn have been made on a file of 693 patents. While it is realized that this may possibly be too s m a l l a sample on which to base ultimate conclusions, certain trends a r e unmistakable and appear worthy of mention at this time. The Office of R e s e a r c h and Development in the U. S. Patent Office is charged not only with investigating new methods f o r mechanizing the patent s e a r c h f o r patent examiners, but also with making these new methods and their s e a r c h files available to the public when practicable. Therefore two s e a r c h systems were developed and investigated. The two systems a r e , however, now compared in this r e p o r t f o r the purpose of determining which s y s t e m is better on a n over-all basis. It is felt that this ultimate decision depends entirely upon the needs of the u s e r . The two s y s t e m s a r e compared merely to disclose the advantages and disadvantages of each, and also to reveal some idea of how pronounced these advantages and disadvantages actually a r e . In o r d e r to judge a mechanized s e a r c h system s e v e r a l important c r i t e r i a a r e necessary: ( I ) cost, a s to (a) file preparation, (b) machine rental, ( c ) machine operation; and ( 2 ) effectiveness, as to, (a) number of descriptors, (b) number of documents retrieved, ( c ) noise. (Noise: r e f e r s to those documents retrieved in a mechanized s e a r c h which do not answer the question asked, on account of some peculiar feature of the system.) The cost of mechanizing any specific body of chemical subject matter as well as the cost of the machine a r e the two most important f a c t o r s to be considered in any mechanized s e a r c h system. A comparison between the cost of the two systems may best be illustrated by a table. TABLE I System

Machine used Machine rental cost ratio File cost ratio (cost not common to each) Operating personnel cost ratio No. of cards used

RAMP (Computer)

RAMKC

CAMP (Punched Card)

It is observed f r o m this table that the computer rental is about nine times g r e a t e r than that of the punch card machine. This is without doubt the most significant single cost factor at the present time. File cost preparation in the computer system is also 2.5 times greater.. This does not take into consideration the additional cost which is common to both, k., writing structural f o r m u l a s , Because of the fact that the ordinary punch c a r d machine is much l e s s complicated to operate, personnel with a minimum of machine training a r e used, whereas a school trained technician is required to operate the l a r g e r m o r e complex electronic computers. The other very important factor to be considered is the effectiveness of the system. The s e a r c h e s described a r e examples which illustrate the varying features of each system in t e r m s of its effectiveness. The s e a r c h questions have been purposely disguised so as not to reveal the subject m a t t e r of the patent applications involved

.

L-x

1

2%

I

1 1/3

1 1

Questions Asked (CAMP) Specifically

(1) J-CH-1

(1)

x q

J-R-I X

/

Phosphorus nucleus -X

x3-L

I node

XI-J

I1 node

X 1-K

Term. node

RAMP

CAMP

1. Time to ask question

4 min.

3 min.

2. Time of search

3 min.

2 min.

3. No. of documents retrieved

19

12

( 8 ) Actual Search:

R-X'

'Presented before Division of Chemical Lirerature, ACS National Meeting, New York, Seprember 1 3 , 1960.

88

/

-Questions Asked (RAMP)

Multicolumn sorter

9

100

C.

a9

PUNCHEII CARDS AND RANDOM ACCESS COMPUTERS IN PHOSPHORUS SEARCHING

TABLE U

p e s t i o n s Asked (CAMP)

Questions Asked (RAMP)

( 1 ) P-X-A-1 (')

*s-.

Phosphorus nucleus

X2-R

I node

X-A

Inode (Specifically)

RAMP

CAMP 3 min.

1. Time to ask question

4 min.

2. Time of search

3 min.

2 min.

3. No. of documents retrieved

12

48

(C) Actual Search:

T-~-d

h,

-

Questions Asked (RAMI')

43

Questions Asked (CAMP)

( 1 ) T-CH-3 P-X-N-3 13 Frag

d-.

Phosphorus nucleus

(l)

"-*

I node

X3-Q X3-T

11node

RAMP

CAMP

1. Time to ask question

4 min.

3 min.

2. Time of search

3 min.

2 min.

3. No. of documents rei:rieved

13

13

It will be observed that in the m o r e complex e s t e r type compound (A) s e a r c h e d , the CAMP s y s t e m was m o r e effective f o r the relatively s a m e period of searching time in that it r e t r i e v e d only 12 documents in comparison to the 19 which w e r e r e t r i e v e d in the RAMP system. However, in the m o r e simple type of e s t e r compound ( B ) the situation was r e v e r s e d with the RAMP system retrieving 12 documents in comparison to the 48 documents found by the CAMP s y s t e m f o r a relatively s i m i l a r time of search. The intermediate type compound ( C ) gave exactly equivalent r e s u l t s in t e r m s of number of documents r e t r i e v e d over a s i m i l a r period of time. Table I1 discloses some over-all average r e s u l t s based on the 302 s e a r c h e s made in the 52 pat e nt a pp1i cation 13 It i s observed. that f o r approximately equivalent t i m e s of s e a r c h ( 2 minutes on the CAMP s y s t e m and 3 - 4 minutes on the RAMP s y s t e m )

.

CAMP -

RAMP No. of documents retrieved/search Noise No. of descriptors possible Degree of specificity and genericity

10 Little

17 Appreciable

50,000

960

Great

Limited

the RAMP s y s t e m r e t r i e v e d 10 patents per question while the CAMP system retrieved a surprisingly comparable 17. The a n s w e r s to the RAMP s y s t e m a l s o r e vealed very little noise. By "noise" we mean those documents r e t r i e v e d in a mechanized s e a r c h which do not answer the question asked, on account of some peculiar feature of the system. The answers in the CAMP s y s t e m , however, did reveal a considerable amount of noise, undoubtedly a s a r e s u l t of the compositing feature of putting all information describing a total of 25 compounds (average) on one punch c a r d . Table I1 a l s o points out the difference in number of d e s c r i p t o r s which can be used to describe the organic phosphorus compounds in each s y s t e m . F o r example the 50,000 d e s c r i p t o r s in the RAMP s y s t e m enable us to define each alkyl radical in a compound specifically in t e r m s of whether it is a n ethyl o r a butyl radical, a l s o whether it is straight o r branched chain, and finally whether it is alkyl. In the CAMP s y s t e m the 960 d e s c r i p t o r s l i m i t us t o define the above r a d i c a l s in t e r m s of a generic descriptor "alkyl" only. Similarly the radical "bicycloheptane" in the RAMP s y s t e m is defined both as bicycloheptane and cycloalkyl whereas in the CAMP system it is defined only as cycloalkyl. It is thus s e e n that the RAMP s y s t e m , because of its degree of specificity, is particularly outstanding f o r finding relatively simple type specific organic phosphorus compounds such a s triethyl phosphate, t r i c r e s y l phosphate, &., and separating them f r o m their homologs. Of course it a l s o can r e t r i e v e complex compounds. The r e s u l t s obtained f r o m testing both s y s t e m s at various stages of file size also f o r e c a s t some future t r e n d s as illustrated in Table 111. TABLE III

Future increase in file expected Increased time to search Memory Increased noise

RAMP

-

CAMP -

Great

Great

Little Limited Little

Great Unlimited Greater

A distinct advantage of the RAMP s y s t e m is that a s the organic phosphorus patent file

J. F r o m e , F. M. Sikora and M. cannon

90

i n c r e a s e s , the time of s e a r c h in the RAMP s y s tem will increase only slightly, a s a general rule. This is subject to the exception that when a descriptor common to a l a r g e percentage of all organic phosphorus compounds (PnO-1 f o r example) is asked, the time of s e a r c h will in such instances be considerably increased by the size of the file. However, by skillful manipulation in asking the questions the s e a r c h e r can avoid this situation in 90% of his s e a r c h e s . With r e g a r d to the CAMP system, as p r e s ently s e t up f o r phosphates and thiophosphates, the time of s e a r c h is exactly proportional to the s i z e of the file. The present collection of organic phosphorus compound patents indicates that the total number of patents in this field has m o r e than doubled in the past ten y e a r s . Thus we can expect that if the total number of patents again doubles in the next ten o r fifteen y e a r s , the time of s e a r c h will i n c r e a s e only slightly in the RAMP s y s t e m while it is expected to double in the CAMP system. Table I1 also discloses a n advantage of the CAMP s y s t e m over the RAMP s y s t e m with r e spect to the memory f e a t u r e . In the RAMP s y s t e m where we now use the RAMAC 305 the s y s t e m is machine limited. We a t present have the ability to s t o r e only 5 million alphanumeric c h a r a c t e r s . While 5 million c h a r a c t e r s sounds like a l a r g e number, we have discovered that this number is exceeded with relative e a s e when coding organic elements and radicals in the manner of the present system. However, if the memory s i z e of the machines i n c r e a s e s a s expected, this limitation will be removed. Additional use of another random a c c e s s machine with a l a r g e r memory will avoid this problem. The CAMP s y s t e m on the other hand is unlimited a s to the number of documents it will hold. Since the CAMP s y s t e m memory is in effect the number of punched c a r d s used its memory is literally unlimited. Again this statement is subject to the proviso that one must understand that his time of s e a r c h is exactly proportional to the size of his file. A f u r t h e r advantage of the RAMP s y s t e m is that a s the s i z e of i t s file grows, little i n c r e a s e in noise can be expected, while in the C A M P system the amount of noise shows a considerable i n c r e a s e a s the file grows l a r g e r and m o r e closely related compounds a r e patented. Table IV s u m m a r i z e s the advantages of each system. TABLE N ADVANTAGES

RAMP

Large number of descriptors lnverred file Fast No real increase in time a s file grows Varying scope unlimired Little noise

-

CAMP

Large memory Low C O S 1 Fasr

The inverted file feature of the RAMAC is advantageous in that its time of s e a r c h does not i n c r e a s e in direct proportion to the size of the file. The 3 - 4 minute average s e a r c h time f o r the RAMAC indicates that i t i s definitely f a s t enough to f a r exceed the speed of manual searching and, f u r t h e r m o r e , that it will eventually exceed the speed of the CAMP system when a l a r g e r file is developed. The fact that questions on the RAMP can be varied in scope f r o m generic to specific and also the factor that a l a r g e number of questions can be asked of i t quickly and efficiently, makes the RAMP system particularly advantageous f o r anyone interested in asking a great number of widely divergent questions. Patent Office use, f o r example, best i l l u s t r a t e s the type of use which takes f u l l advantage of the varying scope f e a t u r e s of the RAMP system. The outstanding feature of the CAMP system, however, without doubt is i t s low cost. First of all the low rental cost of the machine makes the s y s t e m particularly attractive. Second it is expected that the Patent Office file of c a r d s f o r this s y s t e m eventually may be made available to the public. A third outstanding feature of the CAMP s y s t e m is its speed. This, of course, is best illust r a t e d by the fact that at present it takes only two minutes to s e a r c h a file of 693 patent documents. The speed of the system depends essentially on the speed of the particular punch c a r d s o r t e r used. Table V s u m m a r i z e s the disadvantages of each system. TABLE V DISADVANTAGES

RAMP High Cost Limited memory

CAMP Fair amounr of noise Increase in time a s file increases Limited dictionary

As previously stated the limited memory aspect poses a r e a l problem f o r RAMP when it u s e s the RAMAC 305. Also its high cost may in many c a s e s make the s y s t e m prohibitive f o r certain users. CAMP'S disadvantages a r e not believed to be particularly s e r i o u s a t the present time but it is recognized nevertheless that they a r e f a c t o r s to consider. In conclusion it is believed that our analysis indicates that when cost is of extreme importance and the field of s e a r c h i s limited, the CAMP punch card system appears desirable. In the final analysis, the u s e r must consider the various f a c t o r s involved, weigh them c a r e fully, and then decide which system best suits h i s needs.