An Experimental Real Time Chemical Information System. - Journal of

An Experimental Real Time Chemical Information System. J. Chem. Doc. , 1966, 6 (3), pp 173–183. Publication Date: August 1966. ACS Legacy Archive...
1 downloads 3 Views 1MB Size
AN EXPERIMESTAL REALTIMECHEMICALIKFORMATION SYSTEM LITERATURE CITED

wealthier consumer, and continued deprivation for the smallest and needies: consumer. By first making sure. via personal contacts, that the small information consumer is aware of the information resources and services available t o him, the network can, in time, encourage him to use them effectively. By continually bringing him useful publications, bibliographies, announcements, etc.. and explaining them, the network can in time induce the small library and the technical person without library services to ask for things and to become an active network participant and client. There are, to be sure, exciting new information storage and retrieval devices, new communication devices, and a growing number of means of manipulating and transferring useful sources of information. But it is not enough merely to make libraries more efficient or to devise arid install better or faster means of interlibrary communication. Far more important than how our library resources are connected is what goes through the lines and how it is used.

Committee on Scientific and Technical Information (COSATII Recommendations for AVational Document Handling S,Istem., in Science and Technolog,v. The Committee. LVashington, D. C.. November 1965. P B 168 267. Auerbach Corp.. BOD User Stud?. Phase I . Final Technical Report 1151-TR-3. to the .4dvanced Research Projects Agency. The Corporation. Philadelphia, Pa., May 1965. AD 615 501. Herner and Co., The Ilse of Atomic Energy Commission Technicai Information Tools and Ser:,ices. Final Report t o the U. S,.4tomic Energy Commission, The Company. Washington. D. C.. February 1962. Public Law 89-182, 89th Congress, S.949, September 1965. Herner and Co.. Re,search. Problem-Solcin,q and the Cse of Technica! Information in Small and Medium Sized M a n u facturine Firms. 17. S . Department of Commerce. Washington, D. C.. 1958. P B 181 578. [bid.. Technical Sertiren to Small and Mcdium S i x d Manicfarturerc. b? Basic Supplier Firms in the I'nited States. European Productivitv Ap.encv. Paris. France. 1959.

An Experimental Real Time Chemical Information System* DAVID LEFKOVITZ Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia, Pennsylvania

19104

and CLARENCE T. VAN METER Institute for Cooperative Research, University of Pennsylvania, Philadelphia, Pennsylvania Received February 21, 1966

A chemical information system is described which receives input queries from punched cards or remote teletypes; executes up to 25 queries on a time-shared basis; processes queries by molecular formula, structural formula or fragment, and descriptors; loads chemical records as a list structured file; adds n e w compounds by a registry number: anid uses a specific disc loading strategy to optimize retrieval.

An experimental chemical information system has been developed at the University of Pennsylvania under contract t o the U S Arm]. Edgewood Arsenal The system, which is under a two-phase development. operates in either a r i d time or batched mode The first of these two phases has been completed and has resulted in a sLstem which incorporate5 an IBM 7040 central processor. an I E M 1301 diic storage file. and can be queried (in real time1 from 3 >ingle remote teletypewriter A n expansion in both the number of remote inquir] stations ab well as the q u e n vwsatihtv is piannea in the second phase of the developrrient

This paper describes the first-phase experimental system, which is currently in operation, and indicates how its present capabilities can be further expanded. I t is described as a real time chemical information system, which implies two distinguishing attributes -namely, real time and cheniica; information retrieval. Real time may be defined in various ways in accordance with what is considered t o be either a convenient or a requisite system reaction time for the user. I n general, however. a real time system may be characterized in the following way. The query that is put to t,he system by the user is executed immediately, arid responses a r e returned immedia+eiI t o the 1 1 ~ 73 s soon as the!. are fbrincJ. Reaction time; t ' r . 0 ~ 1 ;? f w swr-inds t ? one m;nate are generally considered as tolerable. 173

DAVIDLEFKOVITZ A N D CLARENCE T. VAN METER The other distinguishing attribute is that it is a chemical information system. Information related to chemical compounds is stored in the memory of this system and subject to access based upon various query modes, including structural formula, molecular formula, descriptive key (properties and applications), and nomenclature search. The system’s response to a query may include a registry number, which serves to tie this record either to other computer records or to more detailed documentation about the compound, the structural formula, the molecular formula, the nomenclature, bibliographic references, and applications. The experimental system a t the University of Pennsylvania has a planned two-phase development. The first phase has included the development of mechanisms for real time retrieval. utilizing molecular formula, structural fragment, and descriptive key screening and atom-by-atom searching. I t was also designed to be a time shared system, but the multiconsole operation was reserved for the second phase of development. The second phase of development is to include the full multiconsole operation, wherein a large number of typewriter consoles may simultaneously query the system from remote locations. I t would also include a more advanced structural search program giving greater flexibility to the structural search query language and would include more efficient screening processes in order to increase the over-all system throughput. I n additon to allowing the molecular and structural formula queries, the Phase 2 system would permit querying by certain kinds of chemical nomenclature, including linear notations, and would be interfaced to other files that have information in greater depth The entire development is viewed as a completely automated, centralized file with a limited hardcore of information assigned to each compound record which may reference other files existing in various stages of automated development. This hardcore of data includes a t least the following: (1) (2) (3) (4) (5) (6) (7)

Registry number Molecular formula Structural formula Somenclature Bibliographic references References to other data files Structural screens and compound descriptors

The system is capable of responding in real time because it employs a random access disc file for storage of the compound records, and a file organization within the disk file called list structuring ( I ) . By this means a large number of lists, based upon structural and otherwise descriptive features of the compounds, is created; a compound record may appear on any number of these lists and yet be recorded only once in the file. The technique was originally applied to problems of artificial intelligence like game playing by computer (2) and more recently has been successfully applied to a real time inventory control system ( 3 ) . Furthermore, the experimental chemical information system, with a list structured file, may also be used in a batched search mode instead of in real time with greater efficiency than is available in present batched search systems. 174

Before proceeding to a more comprehensive discussion of this sytem and how it might be used, some of the salient characteristics of both the Phase 1 and Phase 2 systems are enumerated. SALIENT FEATURES OF THE PHASE I SYSTEM

1. The system receives input queries either from punched cards or from remote teletypes over long distance telecommunication lines. 2 . The system provides output to either a high speed line printer or remote teletypes. 3. The system hardware is currently able to service up to four remote teletypes simultaneously but at present is programmed to service only one of the attached teletypes. This teletype may continuously send any number of queries before receiving answers to prior queries. The system will execute up to 25 of these queries on a time shared basis and will queue the remainder on magnetic tape. 4. The system may be used either in a batched or a real time mode of operation. In the former, the queries are saved on magnetic tape until the appropriate batch size is reached. In the latter the query goes into execution immediately and responses begin within approximately 10 seconds. 5 . Presearch statistics are provided by the system which indicate an upper bound on the number of responses to the query, which approximates the number of retrievals. The user may then instruct the system to proceed with the search. terminate it, or modify the query. He may also terminate the search a t any time after execution has begun. 6. Queries may be based upon one or any combination of the following: a. Molecular formula (exact match or ranges) b. Structural formula c. Structural fragments d. Descriptive keys

7 . Compound records are loaded as a list structured file. The keys of the lists are both chemical fragments and descriptors of compound activities. 8. New compounds are added to the file in a batch by a registration procedure which checks for current existence in the file, assigns structural keys and updates all required lists and key indexes of the file. 9. A specific disc-loading strategy called automatic classification is employed to optimize retrieval of the lists. SALIENT FEATURES OF THE PHASE 2 SYSTEM

1. The present processing equipment may be expanded to accommodate up to 64 remote consoles in simultaneous operation. 2. The teletypes are to be replaced by chemical typewriters so that low volume output will include the structural formula. The line printer is to be provided with a special print chain so that the high volume output will include the structural formula. Alternatively. high volume output could be printed with a cathode ray tube (CRT) with

JOURNAL OF CHEMICAL DOCUMENTATION

AN EXPERIMENTAL REALTIMECHEMICAL INFORMATION SYSTEM camera, such as the SC 4020. The low speed remote typewriters would be used for real time browsing and low volume output. However, the output may be switched, a t any time, from the typewriter to the magnetic tape a t the processing center (which is subsequently printed on the high speed line printer or the C R T printer) for high volume responses. These responses, when printed, would then be sent t o the querist. 3. Nomenclature (01 her than systematic, which can be entered as structure in the Phase 1 system) will be added to the query capability of the system. 4. The central file will be interfaced to local files containing data in depth. 5. The structural formula query language will be improved through the :ntroduction of more versatile search programs and more effective screens. EQU IPME NT C O N F IGU RAT ION

Figure 1 presents the equipment configuration of the experimental system. The central processor is an IBM 7040 computer with