Conventional and Mechanized Search Methods - American Chemical

harrier it is to use it. Indexers and classifiers have developed a high degree of skill and foresight and the development of machine searching brings ...
0 downloads 0 Views 1MB Size
m--

NEW TEC CAEMICAL LITERATURE Presented before the Division of Chemical Literature at the 116th Meeting N. J .

of the American Chemical Society, September 1949, Atlantic City,

Conventional and Mechanized Search Method s. w. c o c " U. S. Patent O&e,

Washington, D. C.

J. W. PERRY Massachusetts Institute of Technology. Cambridge, Mass.

T h e d e n t i f i c and technical information stored in modern librariee is a peculiar f o r m of wealth, valuable only if it is used. The more that is accumulated, t h e harder it is to u e it. Indexers and classifiers have developed a high degree of skill a n d foresight and t h e development of machine searching brings election of scientific or technics1 information i n t o harmony with the n a t u r e of the subject matter being scanaed. However, provision m u s t be m a d e for incorporating new concepts and new terminology i n t o the framework of reference u s e d for indexing and coding.

T

HE scientific and technical information stored in our modern libraries is a peculiir formofwealth. I n the first place, such information is valusble only if it is 4. Furthermore, the more of mch wealth we accumulate, the harder it becomes to use it. Radically new methods of utiliaing stored knowledge are essential if the continuing procees of accumulation is not to

can usually be readily expressed by means of combinations of previously established terms of reference-viz., physical entities and abstract concepts.

impair its value. Before considering various devices which may be used to effect the needed improvements, it is well to consider what features characterize scientific and technical information, Because modem seience and technology are based on experimentation, we m w t first direct our attention to the nature of experimental work. Regardless of apparent diversity, all branches of natural science are based on experiments and observations involving interactions betyeen material objects. The chemist, for example, brings together reactive materials and observes the formation of one or several new aubshces. I n physica, we may wa a photoelectric cell to measure changea in light intensity produced by some source. In experimental biology, the inRuence of some chemical on an organism may be observed. In e& case the record of our experimental work specilies the various entities that were observed 'to interact and the results of such interaction. As a rule, no new terminology need be invented to describe our observations. similarly, theoretical interpretation of experimental observations consists in discovering and describing correlations which

I t follows, therefore, from the very nature of modern soienceboth pure and applied-that the record of its accomplishments consists of information concerning interaction between various physical entities and the theoretical interpretation of such interactions. In consulting scientific literature we will, m a rule, be concerned with locating information delinable in terms of two or more physical entities or concepts. This need to locate information on the baais of interacting factors finds its reflection in typical subject indexes. Thus the rather specific entry "soybean flour" in an annual index of Ckemid Abstracts waa set up aa shown in Table I. Theee index entries provide leads not merely to "soybean flour" but to relationships of that material to other things. The art of indexing consists, to a high degree, in attempting to anticipate future needs for information by setting forth those relationships having the greatest possibility of being useful. It is not physically possible, within the limitations imposed on a printed index, to include all combinations of all factors mentioned in the material being indexed. If this were done, the index would be excewively bulky.

SUBJFCT IND

1456

INDUSTRIAL AND ENGINEERING CHEMISTRY

August 1950

CLASSIFICATION

A similar situation exists in classification. Table I1 shows a portion of the Patent Office classification of coating and plastic compositions. As in the index, most of the items indented under the main heading refer to very different types of subjects combined therewith.

Table I.

Index Entries

analysis of, report of seed and meal analysis committee on, 7005d. bread contg., 7000b. in bread (Hungarian), 7546i. culinarv oreon. a n d uses of. 1768~. detn. i< iereal products, 35472. effect on alimentary equil., 506e. effect on urease in gastric mucosa, 1024b. fat content of, susceptibility t o attack by Tribolium confusum and, i 7 ”__. ma -.

maintenance utilization of proteins of, growth and, 2131e.

Classification as a method for organizing subject matter is characterized by the selection of certain types of criteria as a basis for dividing the subject into nonoverlapping groups. However, any number of types of criteria can characterize a given subject. For example, bacteria may be classified as to their morphology, the conditions under which they are capable of living or growing, their pathological effects, etc. Chemical compounds may be classified on the basis of their chemical structure, with regard to certain physical properties, with respect to their practical applications, etc. I n each case, a classification system consists, in essence, of some permutation of a given type of criteria. It would be physically impossible to establish as separate classes and subclasses every possible combination of all the criteria pertaining to any given broad group of things, and, even if this could be done, the complexity of the resulting system would tend to defeat its own purpose. The art of establishing classification systems is based on skill in discerning which particular combination of factors is most likely to prove of greatest usefulness to the greatest number of users of the classification scheme. Persons who consult the record of science and technology have good reason for gratitude to the high degree of skill and foresight which indexers and classifiers have developed. But no amount of human skill in devising indexes and classification schemes can possibly anticipate future trends in scientific research and development. It is precisely in the direction of the unexpected and surprising result that the most spectacular progress is made. It is from the vantage point of such progress that the researcher of the future will wish to search the scientific literature being indexed and classified today. I t has not proved possible in the past for indexers and classifiers t o provide means whereby the literature of chemistry (or other field of science and technology) could be easily searched on the basis of combinations of concepts not envisaged a t the time of setting up the index or classification scheme. Such searches along new lines have always required considerable ingenuity and effort. With the record of science expanding a t its present rate, there is a parallel increase in the amount of effort and time required for searching from points of view other than those set u p in indexes and classification schemes, MACHINE SEARCHING

As long as our basic bibliographic tools were alphabetized lists of index entries printed on sheets or written on file cards, or classification systems worked out in terms of pigeonholes or similar compartments, no major advance in methods of consulting

1457

and utilizing scientific information was to be expected. Certain machines now in existence-e.g., punched-card sorters-are able to scan coded index entries and base selection of desired items not merely on index entry A or index entry B but on the simultaneous presence of’two or more separately coded entries. This means that such machines can be used to search for combinations of concepts not envisaged a t the time the information was analyzed. Machine searches directed to such combinations can be used to sort documents-e.g., patents-into new classes and subclasses corresponding to newly developed points of view. Machine searching when conducted in this fashion breaks down previously insurmountable barriers and brings the selection of needed scientific or technical information into harmony with the nature of the subject matter being scanned. I n order that machine searching may be practical, it is of course necessary that a search shall be complete within a reasonable length of time. Hand-sorted punched cards meet this requirement for information files of modest size embracing up to a few thousand items. For very large files, automatic high-speed equipment will offer irresistible advantages. It should not be inferred that the machines-no matter how achieve their ultimate possibilities versatile or speedy-can without further effort on our part. Much careful thought and planning will be required. One problem is that of indicating relationships between specific and generic terms-e.g., “soybean flour’’ and “comminuted plant seed”-in such a fashion that searching can be based on either type of concept. Another problem is how to specify different relationships-e.g., “man bites dog’’ as distinct from “dog bites man”-so that the machine can take cognizance of such relationships while searching. Furthermore, provision must be made for incorporating new concepts and new terminology into the framework of reference used for indexing and coding. These and related problems are now under investigation.

Table 11. Patent Office Classification. 218 219

.

220 221 222 223 224 225 226 227 228 229 230 23 1 232 233 234 235

natural resin or derivative containing with fat, fatty oil, fatty oil acid, or salt thereof f a t t y oil two or more kinds of f a t t y oil drying oil with sulfurizing or sulfonating agent with wax with bituminous material or tarry residue with teroene or derivative with hydrocarbon with filler dye or pigment with wax, bltumlnous materlal or tarry residue with wax ester type wax with bituminous material or t a r r y residue with sulfurizing or sulfonating agent with hydrocarbon with filler dye or pigment

The advantages of machine searching are by no means confined to Improvement in methods for selecting items from a file. The fact that a search can be directed to seeking index entry A when simultaneously present with index entry B, means that machine searching will also be effective as a means for seeking and establishing relationships-e.g., of a cause and effect nature. RECEIVED M a y 8, 1950.