Todai Scientific Information Retrieval (TSIR-1) System. I. Generation

for scientific literature data, STF, to beused in a scientific information retrieval system, TSIR-1, was developed by modifying the CAS Standard Distr...
0 downloads 0 Views 415KB Size
YAMAMOTO, KUMAI, NAKAKO, ET AL. (10) Papier, L., "Reliability of Scientists in Supplying Titles: Implications for Permutation Indexing," Aslib Proc. 15, 333-7 (1963). (11) Resnick, A., "Relative Effectiveness of Document Titles and Abstracts for Determining Relevance of Documents," Science 134, 1004-6 (1961). (12) Saracevic, T., "Comparative Effects of Titles, Abstracts and Full Texts on Relevance ,Judgments," Proc. Amer. Soc. Inform. Sci. 6, 293-9 (1969). (13) Smith, P. P.. "Increasing Recall of Keyword Indexes by Automatically Indexing Root Character Strings." I b i d . , 7, 275-7 (1970).

(14) Tell, €3. V., "Document Representation and Indexer Consistency: A Study of Indexing from Titles. Abstracts and Full Text using UDC and Keywords." I b i d . , 6,285-92 (1969). (15) Tocatlian. J. J., "Are Titles of Chemical Papers Becoming More Informative?." J . A m e r . Soc. Inform. Sci. 2115). 345-50 (1970). (16) Rindsor. D. A , , "A Format for the Standardized Citation of Both Published and Unpublished Documents." Ibid . 21(3). 195-7 (1970). (17) Zabriskie, K . H., and Farren. A , , "The B. A . S.I. C. Index to Bioiogicai Abstracts." Amer. J Pharm. E d . 32, 189-200 11968).

Todai Scientific Information Retrieval (TSIR-1) System. I. Genera tion, Updating, and Listing of a Scientific Literature Data Base by Conversational Input T A K E 0 YAMAMOTO.* TOMOKO KUMAI. K E N - I C H I NAKANO. CHlKATAMl IKEDA, TOSIYASU L K U N l l HlDETOSl TAKAHASI. and SHIZUO FUJIWARA

The University of Tokyo, Hongo, Tokyo, J a p a n Received August 30, 197 1

A file structure for scientific literature data, STF, to be used in a scientific information retrieval system, TSIR-1, was developed by modifying the C A S Standard Distribution Format (SDF). The use of STF allows one to merge data records originating in CAS and those generated locally. A set of programs for the generation, updating, and listing of a scientific literature data base in STF from a TSS terminal in conversational mode is described.

For a scientist who wants t o accumulate a data base of scientific literature for his own use and for t h e use of his scientific community, the CAS SDF'. 2 , file structure has several attractive characteristics. It is a highly flexible format a n d is largely based on t h e natural language in its expression of t h e content of t h e literature. However, it still leaves much t o be desired for generating new data records or for d a t a records obtained from other sources than CAS as input t o t h e d a t a base. We have been building a scientific information retrieval system (TSIR-1) using a HITAC 5020 T S S 4 ,', of t h e Computer Centre, University of Tokyo. The TSIR-1 system is based on a data structure named STF (for Simplified Todai Format, Todai being t h e abbreviated Japanese word for t h e University of Tokyo). S T F is closely related t o SDF,and its logical content is similar to t h a t of SDF except for a number of modifications. Although more detailed description of STF will be published elsewhere, significant modifications follow.

The text of the abstract and the full text of an article are given their own identification (ID) numbers and may be recorded in STF files. Both Japanese (Romanized) and English entries are allowed for natural language input. The search programs and queries are expected to take care of the occurrence of the two languages in the retrieval phase. The SERIAL KUMBER data element,'ID number 0012 01, and the Temporary Abstract S u m b e r (TAS) data element, ID number 0054 01, in SDF' are modified to allow the use of these data elements for the file key in STF. The modification allows one to distinguish, if necessary, between the locally generated data records and records originating. in CAS SDF files, after they have been merged into a unified data base. Thus, the compatibility as well as the distinguishability of the two kinds of records are obtained a t the same time. Only capital letters are used in the alphabet. This restriction was necessary because our input-output facilities do not include the lower-case alphabet

In S T F , all keywords corresponding to a n article item are combined into one record element, and t h e element is included in the same logical record as the other data related to the item. Thus, each logical record corresponds t o one article except for the case when t h e data are too great for one logical record (maximum record length: 3520 characters).

In the rest of the present paper, a set of programs for t h e generation, updating, and listing of STF data as TSS disc files by conversational input will be described. The programs were written t o help T S S users to generate and maintain their own literature files, without having a detailed knowledge of SDF or of STF. The programs were written in PL/IW language,', a subset of PL/I. They were compiled and executed by the HITAC 5020 TSS.

'To whom correspondence should be addressed a t Department of Chemistry, Faculty of Science. t h e University of Tokyo. Hongo. Tokyo 113. Japan.

228 Journal of Chemical Documentation. Vol. 1 1, No. 4 , 197 1

TODAI SCIESTIFIC INFORMATION RETRIEVAL SYSTEM GENERATION OFTHE FILE

... ...

I N P U T AUTHOR N A M E S . O N E ny O N F FUJIYARA s

e..

The execution record of t h e program, GENERATE-STF, is given in Figures 1 and 2. In these figures as well as in subsequent ones, information input by t h e user is underlined. In Figure 1, t h e system asks t h e format of the file. First, t h e number of authors in the first article item is asked. Then. t h e presence of several elements likely t o appear in a typical scientific record is asked. The user replies by typing in 1 or 0 for t h e affirmative or t h e negative answer, respectively. Next, t h e system proceeds to ask if there are other elements t h e user wants. For these additional elements only the user I S expected t o supply t h e system with t h e corresponding ID numbers and modifiers applicable to S T F . In Figure 2, t h e system requests t h e input of data elements according to t h e format determined above. For d a t a elements which are expected to be shorter t h a n 68 characters (one input line), the system assumes t h a t a n input d a t a element ends when t h e ‘return’ key or the ‘line feed’ key is pressed. For other date elements, ‘equal‘ key followed by one of t h e above keys is assumed to be t h e end of t h e input string. The character count is taken, and t h e input is edited, if necessary, to form an STF d a t a element. Finally. d a t a which did not appear in t h e specification of t h e format, ABSTRACT NUMBER. is requested. This must be a four-character string which is uniquely assigned to the logical record in t h e user’s file system-that is, not only in t h e present file but in all the files which are expected to be used together in a subsequent retrieval operation. The input d a t a are transformed into the STF SERIAL KCMBER, ID number 0012 01, a n d into the S T F TAN, ID number 0054 01, by connecting with strings ‘XX00,’ and ‘XX000,‘ respectively. The above data are used to form d a t a element tags and d a t a elements, and these are combined t o form a n STF logical record in a buffer area in t h e CPU. The remaining size of t h e buffer area is notified to t h e user in t h e LIKE (80 characters) units. Then, it is asked whether more data input is desired. If t h e user answers in t h e affirmative, t h e system asks t h e number of authors of t h e next article item. As to t h e other kinds of d a t a elements, t h e system assumes that t h e same format applies a s in the first logical record.

IYPU? HECOhn F O m A T HOW M A Y Y A{’THOhS’

... 3.

... -

AOYAGI K UIYhVAE T

INPUT T I T L E S I N P U T .-Sl(H

AT THE mD

N U C L E l D l C M A S S M E A S U R W M T BY I O N CYCLOThON RESONPNCL.

.e.

... ... -

INPUT J O U W A L N W E B L L L . C H a . SOC. J A P A N

I N P U T W L W E NUMBEW

43

... ... -

INPUT PAGE I N P U T YEAR 1970

I N P U T DATA ~ ~ A E N T I I O N O .4 0 9 b - - - - - S I ( H

A T T H E END

A NEW UETHOD FOH MEASURING THE N U C L F A h M A S S BY I O N CYCLOTRON

..e

... ...

RESONANCE SPECTROSCOPY

I S D F S C R l BED.

1 H F M E l H O D I S Akn.1 ED

TO THF D E T F M I N A I I O N O F T H L NUCLFAR uass O F ARGON-40. T H F ACCllhACY OF 1 H E UETHOD I W D P O S S 1 8 L E S O U R C E S O F ERROR A R E

. ... -

DISCUSSED..

I N P U T A B S T U C T N L U B L R I N 4 CHARACTERS 0021

L I N E S L E F T = 38 I N P U T I I F N m T DATA I N P U T I N P U T 0 I F I N P U T W D AND P I L E h E G I S T h A T I O N WANTED

... 0

... e

INPUT F I L M A M E IN 4 D I G I T S FILENIWE=

FlLEBCSP

C P U T l MF 005. J S E C

Figure 2

The conversational generation of an STF file

The input of the content and the registration of the file

Then, input of t h e content is repeated as in t h e first item. ( T h e messages requesting t h e input of data elements are typed out in abbreviated forms for t h e second time on.) If, on t h e other hand, t h e user answers in t h e negative, t h e system proceeds to write t h e disc file from t h e buffer area. The name of t h e file is generated by connecting t h e four characters given by t h e user with a fixed string, ‘FILE’. The resultant eight-character name satisfies, as long as it is unique in t h e user’s S T F file system, all t h e necessary conditions for a n S T F file name. As t h e user needs t o know t h e file name for future access t o t h e file, he is notified of the resultant file name.

KFYVOhDSs I Oh 01 0

... JOUMAL ... -

UPDATING THE FILE

ARTICLE l l T l F , I O h O? I

N A M E I I Oh 0 1

1

ox

... 0 ... ... I o n man. ... ... CODW,I

07

VOLUME NUMBERII O h 01 I _. PAGE, I

01

I O R 07

I

ANY O’IFIEh CATEGOMY, I

O h 07

I

... lNPUT&OD

INPUT l P I 0 I N 4 D I G I T S 4096

... ci ... -

IN 2 D I G I l S

ANY O’MEh CATEG0HY.I

O h O?

0

Figure 1 .

The conversational generation of an STF file

The determination of the input format

STF files t h u s generated can not be updated by t h e usual UPDATE command; for a HITAC 5020 TSS file, because of their compact a n d complex structure. A program was written which can be used by t h e user to cancel one logical record, t o replace one logical record for a new one, or to extend t h e file by several logical records. The execution record is given in Figure 3 for t h e replacement. First, t h e name of t h e file t o be updated is requested. Then, t h e ABSTRACT KUMBER of t h e logical record to be updated is requested. As this is input in four characters and is processed in t h e system to generate a n S T F SERIAL NUMBER d a t a element applicable only for the locally generated records, t h e user’s updating a record originating in a CAS S D F file is blocked. For file extension, a dummy number ‘0000’ must be given as t h e ABSTRACT NUMBER. T h e system then reads t h e appropriate STF file in from t h e T S S Journal of Chemical Documentation, Vol 1 1 , No 4, 1971

229

YAMAMOTO, KUMAI, NAKANO, ET AL. $ ~ " b D A T L ' S U

I N P I I l ST? D I S C P I L I NwE ..a

e..

...

... 1

00591001 00591002 00591003 005B1001

OOOOA 00008 00009

005D1001

00016

005F1201

OD002 00003

INP'IIl HFCOhD SEhlLb NUMUfh I N 4 OI(11Tb INPIIT 0 0 0 0 I N 4 D I G I T S ?OW DA1A L X l W S l O N

pppl INPUT I ?OH DATA mcn~ccnmf INPUT 0 TON DATA CANCFLLASION INPUT - 1 FOR DATA L X T m S I O N

INPUT WLCOWD F0)PIAl HOW MANY AUTHOWS? 6..

2

... ?. ... 1

00035

00611301 005E7001 10001001

OOOOf OOOFl

00121001

00008

00541001

00009

U F Y M H D S , I Oh O?

AHTICLL TI n c . I o h 07

... 0 ... 1 ... ?.

ow

...... 1

SOC.

JAPAN

M A 5 5 BY I O N C YCLOTHON HLSONIWCE S k F C T h O S C O P Y I S D E S C H I B I D . THF M ETHOD I5 A P I - L I E L 10 M I D F I E l U 4 I N A l I O N O F THE NUCLEAH MASS O F ARGON-40. T H E ACCUhACY OF T H L METHOD AND P 0SSIWI.I SOUHCES OF EHhOR ARE D I S C U S S E D . x x 0.0.0 0 2 1 XX0000021

C,PUTlMF 0 0 2 . l S F C

on

07

SRW( P R I N T - S T F ) I N P O T STF D I S C F I L E NAME

P A G C r l O R 07

I

.e.

OR 07

FILEBCSJ

I N P U T S E h I E S NLMBEh I N 4 D I G I T S

... 0

...

o?

M Y o w E h c a T c m t t Y , i oh

00711001

00038

00597001

OOOOA 00008

I O N CYCLOTRON RESONANCE SPECTROSCOPY,

.._.".

AhGON-40,

ACCU

sa?"

I N P U T KFIWOhDS SLPAMTCD 0Y CQIMAS.INPUT

... ... ... ... ...

CHFM.

561 0.0 .0 .0 .7 0 A N E L MFTHOD F O h M F A S U h I N G T H E N U C L F A h

... 0

O?

WLWL NIWBEH,I

YUR.1

r

BlhL. 43

I N P U T I I F NFXT S F h 1 1 5 NO I N P U l I N P U T 0 I F INPLlT END

JOWU N W E o I OR O? CODM.I

FllJlUAhA S AOYAI.1 K MIYARAF 1 N U C L E I D I C M A S S M I A S L ' h W W T BY I O N CYCLOThON hLSONANC

.-SI01 AT ?"E D I D

ION CYCLOTRON R F S O N M C E SPECTROSCOPY, AhQXi-4Q4

00597002 00591003

00009

0 0 5 ~ i o o i 00035

FUJIYARA S AOYAGI K MIYAPIAE T N U C L E I D I C MASS MEASUREMENT n y I O N CYCLOTRON WESQNANC F

nccuiucy.

00551001

I N P U T AUTHOR N M E S , O N E

BY ONE

FWIUIRA 5

AOYAGI I[ MlYWAE

T AT THE MD

I N P U T T l T L E , I N P U T .-Sl(iN

N U C L E I D I C M A S S MEASUHUE74T 0Y ION CYCLOThON RESONANCE.

... 2 ... 561 ... -

BCSJAB 43 561

005t1001 00006 000070 0 0 1 8 7 0 0 1 00008 xx000021 00541001 0 0 0 0 9 XXOOOOO21 I N P U T I I F NEXT SERIES NO I N P U T I N P U T 0 I F I N P U T END 0

...

CPUTIME 0 0 2 . 6 S E C

Figure 4. updating

...

I N W T CODEN

00006

0 0 5 ~ i m o i 00002 00611301 00003

The listing of the STF records before and after the

INPUT W L w r NWBER

I N P U T PAGE I N P U T YEAR 1910

I N P U T ABSTRACT NLUBER I N 4 CHAHACTEHS

*..

L I N E S LEFT. LAST H m n n I- HECORD L I N E S usrm INPUT F i L m

...

41 DIRECTORY IN F I L E 4

w E I N 4 DIGITS

FILRIPMLIT=FlLLBCSJ n m s r DELETE- F I L F B C S P CPUTIMF OOB.4SFC

Figure 3

-

The updating of an STF file record

file. It then requests the user to specify t h e kind of work to be done: replacement, cancellation, or extension. The old file is then divided into logical records, t h e records are searched for t h e SERIES NUMBER record element, and t h e value of t h e element is compared with the one given by t h e user. Appropriate work as specified by t h e user is performed on t h e logical record thus accessed. The updated records are placed in a buffer area, and finally it is filed under a new file name generated by t h e same technique as described before. The new name is given to t h e user, a n d t h e system advises t h e user to delete t h e old file as it will otherwise be left in the TSS file. LISTING O F T H E FILE

Because of t h e complex structure of a n STF file, a usual file PRINT command' yields a n unintelligible terminal 230

Journal of Chemical Documentation. Vol 1 1, No 4, 1971

output. A program whose execution record is shown in Figure 4 asks t h e user to give t h e STF file name and t h e ABSTRACT NUMBER of t h e logical record t o be displayed a t t h e terminal. T h e S T F file is read in, t h e appropriate logical record is assessed, and its record elements are displayed together with t h e d a t a element tag and the total character counts. DISCUSSION Any local community of scientists should have its own data base on which a retrospective search may be performed by t h e member, preferably on a n on-line basis. By 'local community' is meant such groups as a professional society in a country, a regional branch of t h e society, a college, a department of a university, a n d a research group of active scientists. The bulk of t h e d a t a base may be generated conveniently by retrieving relevant d a t a records by t h e current awareness mode of operation (generation of subfiles instead of bibliographic listings is needed) from some largescale d a t a bases such as CAS a n d MEDLARS tapes; however, some means of t h e members' supplementing t h e d a t a base with their own data records must be available if it is going to be t h e group's central data base. The members cannot, and should not be expected to, know all of t h e intricate rules a n d techniques which enables t h e system t o manage t h e inhomogeneous files thus generated. The system has to assist t h e users with a set of commands which: Outputs t h e necessary instructions for t h e user Edits t h e input d a t a into forms suitable for storage Protects t h e user against coincidental errors such as giv-

R E F E R E S C E LITERATURE T O THERMODYSAMIC DIAGRAMS ing the same TAX t o a private record as is used in a CAS file,? or giving a n inappropriate file name The present system, TSIR-1, is designed to be such a system for the scientific community at the University of Tokyo. In the system, S T F records are generated by the on-line input described above, by a tape-to-tape conversion process of S D F tapes, and by the retrieval of large-scale S T F files on the 'current awareness' basis. The programs described above were developed for use as commands in the system. As a typical Japanese user is not expected to be a good typist, the programs were made to work with as simple input from the user as is compatible with flexibility of its functions. For the same reasons, numbers '1' and '0'.rather than 'YES'. ' S O ' or 'Y', ' S ' ,were requested as the affirmative and the negative responses from the user. The programs have been successfully used by several TSS users as their private programs, and the data files thus generated within their TSS files have been used for on-line generation of KWIC indexes of both English and Japanese title data elements. The programs will be registered as commands in the system in the near future. ACKNOWLEDGMENT

The stimulating discussions of Haruo Hosoya and other members of the Chemical Information Seminar held at the Department of Chemistry. University of Tokyo, are grate-

fully acknowledged. Part of t h e work has been supported by a fund from t h e Ministry of Education of Japan, Tokutei-Kenkyu I, Showa 45, No. 99042.

LITERATURE CITED (1) Chemical Abstracts Service, "Standard Distribution Format

Technical Specifications," Columbus, Ohio, 1970. ( 2 ) Chemical Abstracts Service, "Data Content Specifications for CA Condensates in Standard Distribution Format," Columbus. Ohio, 1970. ( 3 ) Anzelmo, F. D., "A Data Storage Format for Information System Files," lEEE Tram. C o n p u t . C-20(1), 39 (1971). (4) Motobayashi. S., Masuda, T., and Takahashi, N., "The HITAC 5020 Time Sharing System," Proc. ACM 24th .Vat. Coni.. p. 419. 1969. (5) Xakata. I., and Hamada. H.. "HITAC 5020 P L j l Compiler." H i t a c h Reu. 17(11),432 (1968). (6) Central Research Laboratory, Hitachi Ltd., "PL/IW Gengo no Shiyo (Specifications of PL/IW Language)," 3rd ed., Tokyo, J a p a n . J a n . 1969 (in Japanese). ( 7 ) Takahasi. H.. Kunii. T. L.. Saiga. I\;.,-.and Nakano. K., "Nijigen Banchizuke Hoshiki ni Yoru 5020 TSS no Gaiyo (An Outline of 5020 TSS with Two-Dimensional Addressing)." Center ,Veic.s (The Computer Centre, Univ. of Tokyo) 2(4), p. 6 (1970) (in Japanese).

Reference Literature to ThermodynamicDiagrams A. L. HORVATH 18, Harlow Close, Thelwall. Nr. Warrington. England Received March 2 1, 197 1

A review of the sources of published thermodynamic diagrams which are frequently used by engineers in research and industry is presented.

Thermodynamic diagrams, such as pressure-enthalpy (p-H), temperature-entropy (T-S), enthalpy-entropy (H-S), volume-enthalpy (v-H), and temperature-enthalpy (T-H) are frequently used by engineers in design calculations, particularly for compressors, refrigerators, and power cycles (Rankine, Carnot, etc.). Several diagrams are available in various textbooks; however, a review of the published charts will assist engineers to a quick and easy selection according to the requirements. This article is based

mainly on secondary sources (textbooks), which are available in most technical libraries. In most cases, the original diagram is presented in an enlarged form with a n accordingly higher accuracy, which is always required by these types of calculations. Tables I and I1 list single inorganic and organic substances arranged in alphabetical order by chemical formula, thus providing a quick method for finding a given compound in the tabulations. The numbers refer to the

Table I. Inorganic Compounds Formula

A -

co CO?

He

lame

Kelrieerant S o .

Argon Air

740 729

Carbon monoxide Carbon dioxide

728A 744

Chlorine Hydrogen (normal) Hydrogen (para) Water

771 702n 702p 718

Helium

704

Reterences

Formula

lb,7b,18b.23c,36ace lb,ib,18b.23b,24ce,25a, 26a,36ace 5a.7b, 13b, 18ab,23c la,5a,Tb.l8b, 19a,23bc, 24a,25a.26a,37abc 15ab,18a,28b lb,5a, 18b,23bc 1bc ,30b 5a,9d, 16c,17t,18bc,23c, 24~,26bc,27t,31ct,33t lbc,18bc,23b,39a

Hg

Same

Retrigerant Si>

900 739

N>

Mercury Potassium Nitrogen

I"

Ammonia

717

Nitrous oxide Sodium Neon Oxygen Sulfur hexafluoride Sulfur dioxide

744A 723 720 732 846 764

K

s,o Na Se 0 2

SFti

so2

728

Keterencec

23c 23c lb,5a.;b,l8b,23b,36ace, 39a la,5a,ib,9a,14a,18b,19a, 24a,25a,26a,39a 23c,24a,25a,26a 23c lb lb,5a,18a,36ace 18ab 5a,24a,25a.26a

Journal of Chemical Documentation. Vol. 1 1 , No. 4. 197 1

231