Mechanized Searching of Phosphorus Compounds: I. Random Access

Doc. , 1961, 1 (1), pp 76–83. DOI: 10.1021/c160001a021. Publication Date: January 1961. ACS Legacy Archive. Cite this:J. Chem. Doc. 1, 1, 76-83. Not...
0 downloads 0 Views 659KB Size
MECHANIZED SEARCHING O F PHOSPHORUS COMPOUNDS I. RANDOM ACCESS MECHANIZATION* By J. FROME U. S. Department of Commerce, Patent Office, Washington, D. C.

INTRODUCTION The U . S. P a t e n t Office is now conducting mechanized s e a r c h e s i n the field of organic phosphorus compounds. Various techniques f o r mechanized s e a r c h i n g have been t r i e d previously i n the U . S . P a t e n t Office 112p3*4 using the s e r i a l or sequential f i l e . Quite r e cently another experiment w a s conducted with a n "inverted file" technique using p a r a l l e l acc e s s searching.6 The p r e s e n t s y s t e m borrows many of the concepts f r o m t h a t experiment but adds s e v e r a l f e a t u r e s . One of the advantages of a random a c c e s s i n v e r t e d f i l e technique is that the s e a r c h t i m e i n c r e a s e s v e r y l i t t l e a s t h e number of documents i n c r e a s e s . It is f e l t t h a t the use of a n e l e c t r o n i c computer having a l a r g e random a c c e s s m e m o r y m a y be one of the ans w e r s t o the p r o b l e m of mechanized s e a r c h i n g f o r organic chemical compounds. Some of the o b j e c t i v e s of the p r e s e n t s y s t e m , called "RAMP" (Random A c c e s s Mechanization of Phosphorus), a r e ( 1 ) to t e s t and evaluate the automatic f i l e p r e p a r a t i o n techniques previously developed; ( 2 ) to t e s t , develop and evaluate a random a c c e s s s y s t e m f o r searching organic chemical compounds using d i s c l o s u r e s of the phosphate e s t e r a r t f o r p r e l i m i n a r y d a t a p r e p a r a t i o n ; ( 3 ) t o aid the patent e x a m i n e r and the r e s e a r c h w o r k e r t o m o r e effectively and quickly s e a r c h the chemical l i t e r a t u r e and ( 4 ) t o d e velop p r i n c i p l e s f r o m the p r o b l e m s encountered i n this s y s t e m which m a y be applicable t o a univ e r s a l s y s t e m f o r s e a r c h i n g chemical compounds. Arts Selected. The RAMP p r o j e c t involves all the patents i n C l a s s 260, s u b c l a s s 461, as c l a s s i f i e d i n the U . S. P a t e n t Office. It a l s o i n cludes those organic phosphorus patents i n o t h e r s u b c l a s s e s which a r e obtainable under our p r e s ent c l a s s i f i c a t i o n s y s t e m . T h e r e a r e approximately 1200 patents c l a s s i f i e d officially i n c l a s s 260, s u b c l a s s 461, including o r i g i n a l and cross r e f e r e n c e s . The p r e s e n t phosphate p r o j e c t h a s added to t h e s e 1200 patents about 500-700 additional patents containing organic phosphorus compounds, making a total of approximately 1900 p a t e n t s . It is hoped eventually t o include o t h e r additional s u b c l a s s e s containing phosphorus and perhaps all t h e l i t e r a t u r e which contains phosphorus compounds, but at the p r e s e n t the p r o j e c t is l i m i t e d to 1900 patents a l r e a d y mentioned. System. S e v e r a l important f e a t u r e s of t h e p r e s e n t s y s t e m a r e ( a ) u s e of "inverted f i l e

-

--

technique," as h a s a l r e a d y been d e s c r i b e d 5*6 ; ( b ) the u s e of groups of a t o m s as building blocks ( N O Z , S03H), d e s c r i b e d i n the VS3 System3; ( c ) u s e of nodes to show r e l a t i o n s h i p s and g e n e r a t e combination t e r m s . H. P. Luhn9 h a s suggested t h a t some r e l a t i o n s h i p could be indicated by considering the connection between two e l e m e n t s as a node and t h i s idea h a s been f u r t h e r extended i n t h i s s y s t e m . A node is a collection of a t l e a s t two f r a g m e n t s o r building blocks. (d) The u s e of a "compound" digit, which is the fifth digit of a five digit code used i n the s e a r c h . The compound digit enables u s t o distinguish between s e v e r a l compounds i n the s a m e docum e n t and h e l p s avoid any searching "noise"; ( e ) u s e of combination t e r m s to show c e r t a i n t y p e s of connections, %., c h l o r i n e - r i n g ( C l - R ) and c h l o r i n e chain (CI-CH)r e p r e s e n t s that C1 i s attached t o a r i n g and chain, r e s p e c t i v e l y ; ( f ) the use of s e v e r a l l e v e l s of g e n e r i c i t y ( s e e (6); ( g ) generation of a n alphabetical index t o e a c h patent and alphabetic index of specifically named compounds 8. It i s believed that the s y s t e m can b e s t be d e s c r i b e d o r i l l u s t r a t e d by dividing i t into t h r e e p h a s e s , namely: ( 1 ) the f i l e p r e p a r a t i o n , (2) t h e f i l e organization and ( 3 ) the s e a r c h s t r a t e g y . 1. F i l e P r e p a r a t i o n . Some of the t e c h niques of f i l e p r e p a r a t i o n used h e r e i n have been d e s c r i b e d i n the P a t e n t Office Reports.8 The initial s t a g e of the RAMP p r o j e c t c o n s i s t s of (a) underlining i n each patent all the organic phosphorus compounds contained t h e r e i n ( F i g . 1). This job w a s done by organic c h e m i s t s who underlined e a c h and e v e r y o c c u r r e n c e of organic phosphorus compounds, The underlining was checked by a patent e x a m i n e r who i s a n authority i n the phosphorus a r t i n the U . S. P a t e n t Office. F r o m the p a t e n t s , the n a m e of each underlined compound w a s punched into a s e p a r a t e c a r d . If t h e r e w e r e fifty compounds, fifty c a r d s r e s u l t e d . After v e r i f i c a t i o n , the punched c a r d s then w e r e put through a n e l e c t r o n i c computer to e l i m i n a t e the d u p l i c a t e s and t o obtain a printed alphabetic l i s t i n g of all the organic phosphorus compounds named i n e a c h patent ( F i g . 2 ) . The compounds i l l u s t r a t e d by s t r u c t u r a l f o r m u l a s i n the patent w e r e not included on this l i s t , These p r i n t e d l i s t i n g s then w e r e attached t o the patents and given t o the organic c h e m i s t s . ( b ) The organic c h e m i s t s w e r e i n s t r u c t e d to w r i t e the s t r u c t u r a l f o r m u l a f o r e v e r y organic phosphorus compound on the l i s t . After all the s t r u c t u r a l f o r m u l a s w e r e w r i t t e n , the patents

*Presented before Division of Chemical Literature, ACS National Meeting, NewYork, N. Y., September 13, 1960.

76

-

RANDOM ACCESS MECHANIZATION IN PHOSPHORUS SEARCHING

77

FIG. 1. A COPY O F AN UNDERLINED PHOSPHORUS PATENT

2,875,229

United States Patent Office

patented Feb. 24, 1959

2

I 0

2,875,229

[C,HIOl,P=O

+ CH,tjOC,H,

PREPARAT

0

m

[C~TI,Ol:!'lOCillrl

5

Harry W. Coover, Jr., and R i c h r d L. McConneW KingsCompany! Po* TeM'v to Earban Rochester, N. Y., a corporation of New Jersey

No Drawing. Application February 14, 1956 Serial No. 565.300 6 Claims. (Cl. 2 6 0 - 4 6 1 )

This invention relates to a process for the Preparation

of neutral mixed phosphc'es In a specific aspect this i d e n t i o n relates to a process for p r e p a r i n g w t r a l m i ~

phosphates,having the structural formula: [RXI[R'XI [R'XIP=X wherein R and R' are radicals selected from the group consisting of alkyl, substituted alkyl, aryl and suhstitutcd awl and wherein X is either oxveen or sulfur.

n

+ CHiCI1 0 C ~ I I

T h e reaction is carried out for a period of 1 to 24 hours and at a temperature of 100 to 275' C. The preferred temperature range is from 125 to 240" C. for a period of time ranging from 4 to I 6 hours depending lo upon the reactants employed. When no catalyst is used in the reaction. temueratures. in excess of 200' C., for example about 250' kl, are employed. 'The reaction can, however, be carried out in the presence of a catalyst and when a catalyst is used, substantially lower tem15 peratures are suitable. For example, when a lead oxide catalyst such as litharge is used, a temperature of about d 130' C. is satisfactory. When a catalyst is used in the reaction, an amount within the range of 0.5 to 5% by weight and higher is usually employed. 520 Varying the reactants has an effect upon :he mixed hosphate esters roduced in the reaction. For ex& butyl acetate withitriethvl phosphate,\ f n the reaction

3

Example I.+Mixed h i t r y / ct!ry/ p / ~ o - ' p / i ~ i t e s / ride. "he only alternative is to react an alknli metal A mixture of 36.4 parts of trieth (1 hos hate 139.0 alkoxide with the chlorophosphatc but the yield5 from 40 parts of n-butyl acetate and 5,O'parts)of Pyellot plufnbous this procedure areLpoor due to comieling reactions. oxide (litharge) was heated under total reflux for 2 In accordance with this invention, it has been found hours with a pot temperature of 130- c. Then low tha neutral mixed phosphates can be produced economiboiling materials were removed froln the top of the discall! in excellent yields by re!cting aLphosPhaWelected tillation column within the 75-120° c. ranee for 12-14 from the group consisting of trialk 1 hos hates and tri- 45 hours. This distillate consisted of 'I n1ixGre of ethyl alkyl thiophosphates w h e r e i t the :I& r:dicals' con% and butyl acetate. .fhe remainder of the exceps butyl UP t o 8 carbon Per alkyl radical, with a lower acetate was removed by distillation at atmospheric prescarboxylic acid ester. The products OF this invention sure. me reaction mixture was then filtered to remove have the structural formula: the catalyst residue and vacuum distilled. After remov50 ing 13.4 parts of unreacted triethyl phosphate, 16.1 parts CRXl [R'XI [R'XIP=X . 84- ' . a j 2 . 5 mm.) of but 1 dieth 1 hos hate wherein R and R' are radicals selected from the group a i d 7; partsYoPdib&lI : $ ;hosphs;te?B. P. 95-98' consisting of alkyl containing up to 8 carbon atoms, C. at 2.5 mm.) :ere collected. It is unneSessarv to fracsuch a s me!hyI, ethyl, butyl, octyl and the like, halotionate the two hos hates and the entire product boiling alkyl, containing up to 8 carbon atoms, such as chlorowithin the r a n g w at 2.5 m m . can be used since propyl, bromobutyl, and the like, and aryl and subthis mixture makes a n exceilent plasticizer for cellulose stituted aryl, such as phenyl, cresyl, chlorophenyl. nitroesters. This mixture of butyl ethyl phowhates ,can be phenyl, and the like. In these products R and R' ore used alone or in comb?nation with other conventional different and at least one of R and R' is an alkyl radical. plasticizers to give any desired flow. Cellulose esters X is either oxygen or sulfur. In the reaction, the^ containing 15-20 parts of this mixture offlhosphateslare alkyl phosphate is reacted with an ester having the sel f-extinguishing. structural formu a: Example 2.+Mixed methyl octyl plrospl~otesJ 0

'

A mixture of 28.0 parts of trimethyl phosphate 206.0

R,t,d-o-R,,

JJl

In this c a r b o x d acid ester. R" is the same and R' but diEerent ir% the alkyi radicals ,:,i t h q t r F y l p h r hate or thio h s hate reactant. R is il ower alnyl suc as met y , ethyl, propyl, butyl and the like.

65 parts of octyl acetate and 5.0 !arts of litharge was $acted

according to the procedure in Example 1 to give a mixture of\dimethyl octypndrdioctvl methvl uh0sDhate.p Exnrnple +Misea eliiyl pltcrrw plwsp?oies When triethvl DhosDhate is reacted with butvl acetate 70 A mixture of 36.4 parts of yielhvl Dhosphate 164.0 in accordance bith'this invention, the reaction can be parts of phenyl acetate and 3.0 parts of litharge i a s reexpressed by the following equation: acted according to the procedure in Example-l to pro-

,

J. F r o m e

78

FIG. 2. PRINTED LIST OF COMPOUNDS IN PATENT 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229 2875229

BUTYL DIETHYL PHOSPHATE BUTYL ETHYL PHOSPHATES CHLOROPHaSPHATE DIBUTYL ETHYL PHOSPHATE DIETHYL ISOBUTYL THIOPHOSPHATES DIETHYL 0 NITROPHENYL THIOPHOSPHATE DIETHYL PHENYL PHOSPHATE DIISOBUTYL ETHYL THIOPHOSPHATES DIMETHYL OCTYL PHOSPHATE DIOCTYL METHYL PHOSPHATE DIPHENYL ETHYL PHOSPHATE MIXED BUTYL ETHYL PHOSPHATES M E E D PHOSPHATES M E E D PHOSPHATE ESTERS M E E D ETHYL ISOBUTYL THIOPHOSPHATES MIXED ETHYL 0 NITROPHENYL THIOPHOSPHATES MIXED ETHYL PHENYL PHOSPHATES MIXED METHYL OCTYL PHOSPHATES NEUTRAL MIXED PHOSPHATES NEUTRAL M E E D THIOPHOSPHATES ORGANOPHOSPHORUS COMPOUNDS PHOSPHATE PHOSPHATES PHORPHORUS OXYCHLORIDE PHOSPHORYL DICHLORIDE TRIALKYL PHOSPHATE TRIALKYL PHOSPHATES

2875229 2875229 2875229 2875229 28 75 229 2875229

TRIALKYL THIOPHOSPHATES TRIETHYL PHOSPHATE TRIETHYL PHOSPHOROTIUTHlOATE TRIETHYL THIOPHOSPHATE TRIMETHYL PHOSPHATE THIOPHOSPHATES

w e r e then checked to s e e whether t h e r e w e r e any other o r g a n i c s t r u c t u r a l f o r m u l a s in the patent which had been w r i t t e n a l r e a d y , and such f o r m u l a s w e r e included with the f o r m u l a s on the l i s t . The patent then was r e a d carefully t o make s u r e that a l l the r e a c t i o n products containing organic phosphorus a l s o w e r e r e p r e s e n t e d by s t r u c t u r a l f o r m u l a s . Thus t h e r e was obtained a s e r i e s of s t r u c t u r a l f o r m u l a s f o r e v e r y organic phosphorus compound e i t h e r specifically named o r shown by s t r u c t u r a l f o r m u l a o r r e s u l t i n g f r o m any r e a c t i o n product in the patent. In t h i s list not only w e r e s t r u c t u r a l f o r m u l a s drawn f o r the specific compounds l i s t e d but a l s o those g e n e r i c f o r m u l a s f o r which no s p e c i e s had been w r i t t e n . Upon completion by the o r g a n i c c h e m i s t of the s t r u c t u r a l f o r m u l a s , t h e s e f o r m u l a s then w e r e thoroughly checked and vecified by a n o r g a n i c c h e m i s t skilled in the phosphorus a r t . ( c ) Markush. This t e r m i s used in the U. S. Patent Office t o designate a n a r t i f i c i a l genus. One of the m o s t common ways of r e p r e senting 'a Markush g r o u p i s by u s e of a s t r u c tural formula such as:

-

X may be a m e m b e r of the group consisting of OH, NH2, S 0 3 H and Y may be a m e m b e r of the g r o u p consisting of methyl, ethyl and phenyl.

S t r u c t u r a l f o r m u l a s which w e r e in the f o r m of the Markush group w e r e a l s o included in the above list and w e r e l a t e r t r e a t e d in a s e p a r a t e way ( s e e i n f r a ) . ( d ) Encoding is a v e r y important a s p e c t of t h i s s y s t e m and in o r d e r to understand m o r e thoroughly how it is done, we give a d e s c r i p t i o n of the method of fragmenting and nodalizing and the t e r m s used t h e r e i n . A f r a g m e n t i s a chemical element o r a collection of chemical e l e m e n t s t r e a t e d a s a unit f o r chemical o r information r e t r i e v a l p u r p o s e s . It is a component p a r t of a s t r u c t u r a l f o r m u l a and a s e r i e s of designated f r a g m e n t s will constitute a s t r u c t u r a l f o r m u l a . A node is a coll'ection of a t l e a s t two f r a g m e n t s . F o r the purpose of t h i s s y s t e m e v e r y node w i l l contain a phosphorus atom a s one of the f r a g m e n t s . An example of a compound to be f r a g m e n t e d and nodalized is the f o r m u l a 0

GENERAL RULES Explanation of F r a g m e n t a t i o n and Nodalization. The f r a g m e n t a t i o n and nodalization of e v e r y phosphorus s t r u c t u r a l f o r m u l a begins with and r e v o l v e s around the phosphorus atom in the formula. (1) The f i r s t s t e p i s t o designate the e l e ment o r e l e m e n t s which a r e d i r e c t l y connected to the phosphorus atom and a l s o the number of o c c u r r e n c e s of t h e s e connections. The phosphorus a t o m and a l l the e l e m e n t s o r f r a g m e n t s d i r e c t l y connected to it will h e r e a f t e r be d e s i g nated a s the phosphorus nucleus. Example: P = O - 1 P-s-2 P-0- 1 ( 2 ) The second s t e p i s t o designate the f i r s t node. The f i r s t node c o m p r i s e s the phosphorus nucleus and the f r a g m e n t o r f r a g m e n t s which a r e d i r e c t l y connected t o the phosphorus nucleus and t h e i r number of o c c u r r e n c e s and a l s o the a p p r o p r i a t e symbols i n the c a s e of a carbon chain, m e t a l s , halogens, Example: P-S-C3-NT-SAT-ST-CH-2 ( F i g . 3 ) P-0-PHENYL- 1 ( 3 ) The t h i r d s t e p i s t o designate the f r a g ment o r f r a g m e n t s which a r e d i r e c t l y connected to the phosphorus nucleus and also the number of t i m e s they o c c u r . Example: C3-NT-SAT-ST-CH-2 ( F i g . 3) PHENYL- 1 ( 4 ) The f o u r t h s t e p i s to designate all of the f r a g m e n t s which a r e connected e i t h e r d i r e c t l y o r i n d i r e c t l y t o the f i r s t node a s d e s c r i b e d in ( 2 ) above, using the a p p r o p r i a t e symbols and r e c o r d i n g the number of o c c u r r e n c e s of e a c h of the f r a g m e n t s .

-

&.

EtANDOM ACCESS MECHANIZATION IN PHOSPHORUS SEARCHING

Example: C 1-CH-2 C-T-R-I (5) The fifth s t e p is t o designate the position of all the f r a g m e n t s on the r i n g , that is, whether they a r e o r t h o , rneta o r p a r a , and a l s o the t o t a l n u m b e r of f r a g m e n t s i n the s t r u c t u r a l f o r m u l a as well as the valence of the phosphorus a t o m in the compound. Example; P (Para) 12 F r a g Phos 5

482211224567

C3274

I

79

C~-&T-SIT-ST-C*-~

I

I

Ill

II I I I

FIG. 3.bEFINI'TION OF TERMS USED IN FRAGMENTING = T.=nllind = Nm-Termuial = Saturated UNS = Unsaturated S T = Straight BRAN = Brancbed ( 3 1 . cbain R = Ring o = Onbo-two substituents on ring side by side (vicinal) N = Meta - two substituents on ring one space removed P = Para-two substituents on ring two spaces removed No s p e c i e s used, only with generic terms NS CYCL = Cyclic PHOS = Phosphoru:s FRAG = Frnnments AMINE SALT-P = Primary amine salt AMINE SALT-S = I;econdary amine salt AMINE SALT-T = Tertiary amine salt AMINE SALT-Q = Quaternary amine salt POLYMER P = Polymer of phospborus

T N T SAT

FIG. 4. SHOVING THE FORMAT OF THE PUNCHED CARD USED IN LOADING AND DESCRIBING THE FRAGMENTS AND NODES

GIs. Cols. GIs. Col. GIs.

Ill Level Address Patent Number Accession Number Compound Number Fragment or Node

1-5 612 1619

20 35-80

I

The above s t r u c t u r a l f o r m u l a t h e r e f o r e will r e s u l t i n t h e s e f r a g m e n t s and nodes: P=O-1 P-0- 1 l - f P-s-2 P-S -C 3 -NT -SAT -ST -CH- 2 P-0-Phenyl.-1 C 3-NT-SAT-ST -CH- 2 Phenyl 1

-

C-T-R-I C 1-CH-2

P (Para) 12 F r a g Phos 5 Detailed i n s t r u c t i o n s on f r a g m e n t i n g and nodalization will be available f r o m U. S. P a t e n t Office, R. & D. ( e ) Loading; into the F i l e . After the c o m pounds have been f r a g m e n t e d and checked, e a c h f r a g m e n t is punched into a s e p a r a t e c a r d . Thus if a compound contains t e n f r a g m e n t s i t will have ten fragment c a r d s . The punched c a r d ( F i g . 4 ) , b e s i d e s containing the identity of thLe f r a g m e n t , a l s o contains the patent n u m b e r , the a c c e s s i o n number and the compound n u m b e r . In the s y s t e m the patent n u m b e r s a r e d e s i g nated by a c c e s s i o n n u m b e r s which a r e f o u r digit n u m b e r s . When a patent contains m o r e than 36

-

compounds, the patent is r e p r e s e n t e d by m o r e than one a c c e s s i o n n u m b e r . T h u s , f o r e x a m p l e , a patent which contains 46 compounds i s r e p r e s e n t e d by two a c c e s s i o n n u m b e r s and a patent which contains 80 compounds is r e p r e s e n t e d by t h r e e a c c e s s i o n n u m b e r s . The compound numb e r is the fifth digit, a number o r l e t t e r , added t o the a c c e s s i o n n u m b e r . T h i s gives u s the possibility of 36 compounds p e r a c c e s s i o n n u m b e r . As will be s e e n l a t e r on t h e r e i s no possibility of confusion between the v a r i o u s compound n u m b e r s . The c a r d s , e a c h containing a f r a g m e n t , d e r i v e d f r o m the s t r u c t u r a l f o r m u l a s , then a r e put through a tabulating machine and the f r a g m e n t s a r e l i s t e d compound by compound. In the c a s e of the M a r k u s h f o r m u l a t i o n , e a c h M a r k u s h g r o u p is r e p r e s e n t e d a s a single compound even though it m a y in f a c t be a r e p r e sentation of many, many compounds. S e p a r a t e compounds r e p r e s e n t e d by s e p a r a t e M a r k u s h g r o u p s a r e given s e p a r a t e compound n u m b e r s . After c o r r e c t i o n s have been m a d e , a l l the punched c a r d s r e p r e s e n t e d by a l l the patents then a r e s o r t e d alphabetically and a r r a n g e d in alphabetical sequence. T h e s e c a r d s next a r e p r o c e s s e d by the RAMAC, which automatically a s s i g n s a n a d d r e s s f o r e a c h f r a g m e n t and autom a t i c a l l y l o a d s it into the RAMAC m e m o r y . A list of the f r a g m e n t s and t h e i r a d d r e s s n u m b e r s and the number of t i m e s T t h e f r a g m e n t o c c u r s a l s o is obtained. This l i s t of f r a g m e n t s and t h e i r a d d r e s s e s and f r e q u e n c i e s constitutes our s e a r c h i n g dictionary ( F i g . 5). F i l e Organization. The f i l e is organized i n the s a m e g e n e r a l m a n n e r a s m o s t random a c c e s s o r i n v e r t e d f i l e s . Under e a c h a d d r e s s o r d e s c r i p t o r is l i s t e d the combined a c c e s s i o n number and compound number of e a c h compound

-

J. Frome

80

which contains the d e s c r i p t o r . This is c o n t r a r y to the n o r m a l situation a s p r e s e n t e d in the s e r i a l f i l e . The organization of t h i s file is substantially the s a m e as the organization of the file i n o u r However, t h e r e a r e two m a i n previous work. differences. The specifically named compounds a r e not individually r e c o r d e d within the computer but only the f r a g m e n t s composing those compounds a r e so r e c o r d e d . The computer contains t h r e e l e v e l s , a l l of which a r e m o r e or l e s s generic to.the f r a g m e n t . In effect the file is organized into four m a i n groups: l e v e l s 1, 2 and 3 in the computer and level 4 which is in f a c t a printed list of all the compounds in the patents exclusive of M a r k u s h and r e a c t i o n products. Level 1 is considered the m o s t g e n e r i c , level 2 is l e s s g e n e r i c and l e v e l 3 is the m o s t specific. A specific example of the organization of the file can be s e e n i n F i g . 6.

FIG. 6. LEVEL OF GENERICITY

I

I1

I11

_.

Heterocy *-Rings -

1

I1

I11

.

Mo;phoiine N,O

H-.’

Her erolvclic-S

Sii-e-N

*Piperidine N

eterocvl

-* ‘Pyridine N

‘%**T %azine N,S

FIG. 5 . THESE LISTED FRAGMENTS ARE EXCERPTS FROM THE I11 LEVEL, PHOSPHORUS DICTIONARY Address

Name

20017 20018 20019 20020 20021 20022 20023 20024 20706 20707 20708 20709 20710 2071 1 20712 20713 20714 20715 25882 25883 25884 25885 25886 25887 25888 25889 25890 25891 25892 25893

ACRIDINE-1 ACRIDINE-2 ACRIDINE-3 AL-CH-1 AGCH-I AL-CH-1 AGCH- 1 AGCH-2 C-GC-NT-R-7 C-GC-NT-R-8 C-GC-NT-R-9 C-GC-T-CH-1 C-@C-T-CH-10 C-GC-T-CH-11 C-GC-T-CH- 12 C-GC-T-CH-2 C-GC-T-CH-3 C-GC-T-CH-4 C2-T-SAT-ST-R-2 C2-T-SAT-ST-R-3 C2-T-SAT-ST-R-4 C2-T-SAT-ST-R-5 C2-T-SAT-ST-R-6 CZ-T-SAT-ST-R-7 CZ-T-SAT-ST- R-8 C2-T-SAT-ST-R-9 CZ-T-SAT-5T-CH- 1 C2-T-ST-CH-2 CZ-T-UNSST-CH C2-T-UNSST-CH- 1

No. of occurrences in Compound

No. of occurrences in File

1 1 1 1 1 4 1 1 1 1 1 18 1 1 1 4 2 1 18 10 4

5 4 2 2 1 2 1 47

SEARCH STRATEGY To identify a chemical compound f o r patent s e a r c h i n g p u r p o s e s i t i s believed that i t is n e c e s s a r y f o r a s y s t e m to be able to do s e v e r a l things, especially i n the phosphorus art: (1) to be able to identify e a c h of the f r a g m e n t s comprising a compound, ( 2 ) to be able to identify the number of t i m e s e a c h different f r a g m e n t o c c u r s i n the compound; ( 3 ) to be able to find the relationship

between t h e s e v a r i o u s f r a g m e n t s i n the c o m pound; (4) to be able to a s k the s e a r c h question e i t h e r v e r y specifically or generically, that i s , with any degree of genericity d e s i r e d . If a s y s t e m can do t h e s e , it usually will s e r v e the p u r pose of a patent s e a r c h . The f r a g m e n t s and the number of o c c u r r e n c e s i n a compound a r e found e a s i l y by.looking in the dictionary, which will have the name of the f r a g m e n t a s well as the number of t i m e s i t o c c u r s i n a compound. Relationships. Relationships a r e obtained by t h r e e devices, namely, (a) combination t e r m s , ( b ) grouping t e r m s , k., ring or chain, and ( c ) the fifth digit number in the a d d r e s s which shows all the f r a g m e n t s a r e in the s a m e compound. Genus and Species. Since the computer s e a r c h file is organized in t h r e e l e v e l s , it is possible to a s k f o r the compound by asking f o r a combination of f r a g m e n t s e i t h e r specifically o r generically. Some of the f r a g m e n t s m a y be a s k e d f o r by entering the file in the g e n e r i c f i r s t l e v e l and o t h e r s i n the m o r e specific t h i r d l e v e l . Thus by the u s e of the computer w e a r e able to achieve as much genericity o r specificity as required. 3 . Asking the Question. In asking the s e a r c h question, the e x a m i n e r studies the compound which is being c l a i m e d . He then decides what the e s s e n c e of the invention is and by using the s e a r c h dictionary s e l e c t s the combination of f r a g m e n t s and the relationships which will give h i m those documents that will m e e t the c l a i m . A specific i l l u s t r a t i o n of an actual s e a r c h will be i l l u s t r a t i v e . Example: It is d e s i r e d to find the compound

-

-

-

L1

RANDOM ACCESS MECHANIZATION IN PHOSPHORUS SEARCHING

The compound ments p=s-1 P-0-3 P-0-C-T-CH-2 C-T-CH-2

P-0- Phenyl - 1

-

Phenyl 1 C1-R-1 NO2-R- 1

0 M

P 12 F r a g Phos 5

111 Level

would contain t h e s e f r a g -

(P=S,one o c c u r r e n c e ) (P-0, t h r e e o c c u r r e n c e s ) (P-0-C [terminal C chain], two o c c u r r e n c e s ) ( C [ t e r m i n a l C chain] , two occurrences)

81

20495

20782

Cl-R-1

NOTR-1

12340 12356 12452 13332 14587

12340 12356 12452 13332 14568

20796

29847 12 Fragments

12340 12354 12452 13332 14569

12340 12351 12452 13456 14584

p - 0 0 , one o c c u r r e n c e

0

e n : -

occurrence

(C1 Ring attached, one occurrence) (NO2 Ring attached, one occurrence Ortho Meta Par a Total f r a g m e n t count Valence of P

One could of c o u r s e a s k f o r all the f r a g m e n t s and t h e i r r e l a t i o n s h i p a n d obtain the c o m pound. However, a m o r e g e n e r i c question could be a s k e d , k., "for all chlorine and NOZ-containing compounds attached to a r i n g and containing the configuration P - 0 - P h e n y l . ' ' In the computer s e a r c h , t h e r e a r e obtained 34 patents which a n s w e r the question f r o m a f i l e of 693 patents. Howeven, t o l i m i t the question and r e d u c e t h e number of documents r e t r i e v e d you m a y i n s e r t the d e s c r i p t o r "12 f r a g m e n t s " which m e a n s only those compounds containing 12 f r a g m e n t s are a c c e p t a b l e . When this, w a s done only eight patents w e r e r e t r i e v e d . The u s e of the d e s c r i p t o r "No. of f r a g m e n t s " i n a compound is a powerful tool. It enables a s e a r c h e r to obtain the compound exactly and e l i m i n a t e s those compounds which contain additional g r o u p s . F u r t h e r m o r e , it distinguishes the specifically named compoulrds f r o m those which a r e only included by the v i r t u e of being i n a M a r k u s h group. Upon using all the possible d e s c r i p t o r s f o r the compound, only s e v e n patents w e r e r e t r i e v e d . It should be noted that f o r m o s t compound questions it is sufficient to a s k f o r t h r e e to five of the m o s t unusual d e s c r i p t o r s . The r e s u l t s obtained will be a l m o s t as good as if all of the d e s c r i p t o r s had been a s k e d . In p r a c t i c e the a v e r a g e s e a r c h took about 3-5 minutes of m a chine t i m e . Of c o u r s e one could a s k the computer a m o r e g e n e r i c question " F o r a halogen containing compound having the halogen attached t o a n a r o m a t i c ring and containing the (P-0) g r o u p t h r e e t i m e s , " and the above compound would be retrieved. The table gives a g r a p h i c r e p r e s e n t a t i o n of what might be i n the c o m p u t e r ' s m e m o r y :

When the question is asked "Find a compound containing C1 and NO2 attached to a ring and containing a P - 0 - P h e n y l group,ll the c o m puter goes to the a d d r e s s 20495 (Cl-R-1) and t o a d d r e s s 20782 (NO2-R-1) and c o m p a r e s the five digit-numbers and s t o r e s the n u m b e r s that a r e the s a m e . Thus 12340 12356 12452 13332 It then goes to 28796 ( P - 0 - P h e n y l ) and c o m p a r e s the s t o r e d n u m b e r s with those under P - 0 - P h e n y l . This r e s u l t s in 12340 12452 13332 The l a s t digit, which is the compound n u m b e r , then is dropped and the four digit number then is t r a n s f o r m e d into the patent number by a d i c t i o n a r y look-up. If the question included the limitation that the compound should contain only twelve f r a g m e n t s , then the computer goes through the s a m e s t e p s but when i t t a k e s the s t o r e d 12340 12452 13332 and c o m p a r e s t h e s e n u m b e r s with those under a d d r e s s 29847 (12 f r a g ) , the resulting comparison will only give two a n s w e r s 12340 12452 T h e s e a r e then converted to patent n u m b e r s by the c o m p u t e r . In F i g u r e 7 t h e r e a p p e a r s an actual s e a r c h question s h e e t . T h i s s y s t e m is in actual u s e in the U . S. Patent Office, and s o m e actual s t a t i s t i c s f r o m i t s use a r e

PHOSPHORUS MACHINE SEARCHES (RAMAC) 1. 2. 3. 4. 5. 6.

Applications S e a r c h e d Total S e a r c h e s Made S e a r c h e s p e r Application T i m e to P r e p a r e S e a r c h Question Machine T i m e p e r S e a r c h Patents Retrieved per Search

52 200 3.8 2-3 min. 3-4 min. 10

PHOSPHORUS MACHINE SEARCH FlLE P a t e n t s Completed (6- 15-60) Chemical S t r u c t u r e s (compounds) Chemical Compounds p e r Patent Chemical F r a g m e n t s Chemical F r a g m e n t s p e r S t r u c t u r e Chemical F r a g m e n t s p e r Patent F r a g m e n t T e r m s in Dictionary F r a g m e n t s per Dictionary T e r m (frequency) Punch C a r d s Used

693 17,500 25 256,350 15 37 0 23,775 11 350,000

J. F r o m e

82

4. It a p p e a r s that t h i s s y s t e m can be extended to other a r e a s of o r g a n i c chemical compounds in which t h e r e is a c e n t r a l a t o m o r g r o u p of a t o m s serving a s a nucleus s u c h as "silicon" compounds, "boron compounds", %. Of c o u r s e it i s obvious that i t can be used f o r c e r t a i n specific a r t s if one wanted to use such a grouping a s "pyridine" r i n g as a c e n t r a l nucleus f o r the pyridine a r t . Acknowledgement. Acknowledgement is e x p r e s s e d f o r the contributions of M a r t i n A. Cannon of R . & D. and Frank M. S i k o r a of the Examining C o r p s of the U. S. Patent Office, f o r t h e i r aid in the a n a l y s i s and encoding o p e r a t i o n s .

We have made approximately 200 s a t i s f a c t o r y searches.

CONCLUSIONS It is f e l t that the method of file p r e p a r a t i o n , f i l e organization and s e a r c h s t r a t e g y o f f e r s a solution to mechanized s e a r c h i n g in the phosphorus a r t . S e v e r a l g e n e r a l p r i n c i p l e s m a y be gained f r o m our e x p e r i e n c e in using t h i s s y s t e m . 1. It i s i m p o r t a n t to u s e e i t h e r s e m i - a u t o m a t i c or automatic techniques i n f i l e p r e p a r a t i o n . 2. It is e x t r e m e l y i m p o r t a n t t o v e r i f y the a c c u r a c y of the f i l e and a n a l y s i s at e v e r y stage of operation. 3 . The random a c c e s s method of s e a r c h i n g f o r chemical compounds o f f e r s a promising approach.

EXAMINER

-

APPLlCATION NUMBER

STATUS

tl

QUESTIONS

ADDRESS

Cl-R- 1

24878

TOTAL c OF DOCU.

FREQ.

(UTED) ANSWERS

1

NO2-R-1

I

P-0-Phenyl-1

28229

46 1

I

41130

1

574

11

I

II

I

1009

I

2

I

461

CI-R- 1

24878

NOTR-1

28229

5 74

P-0-P h a yl- 1

41130

1009

12 Frag.

44345

772

DATE

RANDOM ACCESS M E C m N I Z A T I O N IN PHOSPHORUS SEARCHING

83

REFERENCES

‘Bailey, M. F., B. E. L.anham and J. Leiboaitz, “Mechanized Searching in the U. S. Patent Office,’’ U. S Department of Commerce, Patent Office, Washington 25, D. C., 1950. ’Frome, Julius.and Jacob Leiboaitz, “A Punched Card System for Searching Steroid Compounds,” Paient Office Research and Development Report No. 7, Department of Commerce, Washington 25, D. C., 1957. 3Leibowitz, J., J. Fmme and D. D. Andrens, “Variable Scope Search System: VS3,” preprints of papers for &e International Conference on Scientific Information, National Academy of ScienceeNational Research Council, Washington, D. C., 1958, pp. 291-316. 4Frome, Julius, Jacob Leibowitz and Don D. Andreas, “ A System of Retrieval-Compounds, Compositions, Rocesses and Polymers,” Patent Office Research and Development R e p a t No. 13, Department of Commerce, Washington 25, D. C ,1958. ’Nolan, John L., =can

Documentation, January, 27-35 (1959).

k e i h o a i t z , J., J. Frome and D. D. Andreas, “Variable Scope Patent Searching by an Inverted F i l e Technique,” Patent Office Research and Development Report No. 14, Department of Commerce, Washington 25, D. C., November, 1958. ’Ciassification Bulletin of the United S a t e s Patent Office, C l a s s 260-Chemistry, Carbon Compounds, Bulletin No. 200, Revision 1. ‘Frome, Julius, “Semi-,Automatic Indexing and Encoding,” Patent Office Research and Develcpment Report No. 17, Department of Commerce, Washink con 25, D. C., l%O. 9Luhn, H. P., “Nodal Index for Branched Structure,” IBM Research Laboratory, Poughkeepsie, N. Y., December, 1955. lo Mooerq Calvin N., Zauor Technical Bulletin No.

64, 1951.