Indexing, Classifying, and Coding the Chemical ... - ACS Publications

Of such major responsibilities of the professional society that pertaining to the ... Massachusetts Institute of Technology, Cambridge, Mass. locating...
1 downloads 0 Views 330KB Size
476

INDUSTRIAL AND ENGINEERING CHEMISTRY

strated need and adequate incentive, American chemists 1% ill quickly demonstrate in this professional field the genius for organization and development which is a national characteristic. The second is the claim that American chemistry cannot finance such a venture. KO doubt this bogey has confronted every undertaking which the AMERICAN CHEMICAL SOCIETY has ever initiated. I n the contemplation of any desired objective, it is not only proper but essential to count the cost. That operation, hon-ever, comes later when skilled fiscal officers have definitr proposals on which t o make specific estimates; it is premature t o obscure our vision in its early stages. The professional society has many functions. With greatly increased size some of these (such as the national meetings) become conspicuous to a degree disproportionate t o their real importance. The nucleus of professional society responsibility involves the development and execution of those technical enter-

Vol. 40, No. 3

prises which, by their nature or magnitude or both, are inherently impossible to the individual components of such professional groups. Of such major responsibilities of the professional society that pertaining to the correlation and organization of the facts of chemistry is undeniably the.most fundamental. LITERATURE CITED

(1) AM.CHEM.SOC., Chem. Eng.News, 21, 658 (1943). (2) AM.CHEM.SOC., Public No. 358, 75th Congress, Chap. 762, First Session, H.R. 7709; IND.ENG.CHEM.,NEWSED., 16, 45-53

(Jan. 10, 1938). (3) Ball, N. T., Science, 105, 34-6 (1947). (4) Huntress, J . Chem. Education, 15, 303-9 (1938). (5) Huntress, "fucleus,20, 1 6 1 4 (April 1943). RECEIVEDAugust 28, 1947. Presented before the Division of Chemical Education, Symposium on Chemical Literature, a t the 111th Meeting of the AYERICAE; CHEMICAL SOCIETY, Atlantic City, N. J.

Indexing, Classifying, and Coding the Chemical Literature JAMES W. PERRY Massachusetts Institute of Technology, Cambridge, Mass.

'

present time the problem of providing systems for locating information has been approached along trio fairly distinct paths : indexing and classification. Both of these conventional methods have their enthusiastic proponents. Librarians and other persons doing general reference work usually rely o n indexes and make scant use of classification. Classification experts in the U.S. Patent Office, on the other hand, are convinced by ext,ensive experience t,hat a n-ell designed classification system is so useful in locating technical information as to be virtually a necessit'y.' These inclinations t o favor either indexing or classification have their origin in different requirements of persons using systems for locating information. The Patent Office, as a matt,er of daily activity, is required t o pass on the patentability of broad claims presented in patent applicat,ions, and must be able to review all the literature relating t o a broad generic subject. I n contrast, general reference work in libraries either does not require exhaustive search of a broad subject, or else deals xvith questions of narrovi scope directed to some specific bit of inforniation. As a consequence, reference vyorkers in libraries usually ate familiar with and prefer indexes which lead directly t,o individual portions of information, whereas the Patent Office prefers classification, which groups together patents (or papers) dealing with closely related subjec,t matter. I n chemical industry, in setting up and prosecuting research programs, it is often necessary to collect all the information relating t,o a rather broad subject. Preparation of such compilations has become increasingly expensive as the volumes of the abstract journals have accumulated year by year. -it present, searching a broad subject in comprehensive fashion using existing indexes is costly in time and consequently in money. Saving time and money by not collecting all information pertinent t o a research program is generally regarded as an unreasonably risky gamble. Existing methods of indexing the abstract' journals are reasonably effective in making searches directed to a given conipounde.g., monochloroacetic acid. Although the steadily increasing amount of mechanical effort involved in using existing indexes has led to proposals to mechanize the process of using such

indexes t o prepare bibliographies, no strong incentive exists radically to change our philosophy of indexing as long as we limit our activities t o locating bits of information identifiable by a specific index entry or to preparing incomplete reviews relating to broad generic subjects. Existing indexes, however, often prove inconvenient in exhaustive generic searches and may be virtually useless. Thus the conventional type of index can be used only x i t h great labor and difficulty t o locate all compounds containing an alkyl group of more than six carbon atoms and a solubilizing group derived from sulfuric acid-i.e., a sulfonic acid, a sulfuric acid ester group, or salts thereof. The question arises irhether indexing could in theory be developed to be the most efficient basis for making generic searches. The answer is yes. However, conventional methods of prepariiig indexes in printed form have proved an effective deterrent to developing the full possibilities inherent in the basic concept of indering. Obviously, it is theoretically possible to include, as part of an index, entries relating t o every generic and subgeneric att,ribute of each item covered by the index and also to include every permutation and combination of such criteria. However, if this were done the printed index would be extremely bulky and consequently excessively costly to prepare. In fact, mere bulk might defeat it,s purpose. -4s a result, conventional indexing makes sparing use of generic headings, and only those abstracts or papers which discuss the subject, matter 'broadly or in truly generic fashion are entered under generic headings. For this reason, it is impossible to rely on generic headings in conventional indexing t,o provide more than a small fraction of the information pertinent to a generic subject. For example, the usual conventionall index vvill not list under "olefins" all papers relating t o et,hylene, propene, butene, etc. Exhaustive generic listing seems impractical with indexes piinted in the usual way. The Patent Office has found conventional classification a more useful tool than conventional indexing. Patents are granted in this country for compositions of matter, useful arts, machines, new manufactures, and improvements in these categories of subject matter. (Certain types of botanical plants were also declared patent,able by relatively recent legislation, but as yet

March 1948

INDUSTRIAL AND ENGINEERING CHEMISTRY

this has had little effect on the U. S. Patent Office’s classification system.) These restrictions as to type of subject matter have largely relieved the Patent Office of the burden of searches directed t o questions solely or principally concerned with fundamental scientific theory and other unpatentable subject matter. However, even in the Patent Office shortcomings of classification have become evident. I n setting up a classification system, it might be theoretically possible to provide a separate class and subclass for each permutation and combination of the various characterizing criteria. In Patent Office practice, this is impossible because the number of classes and subclasses would become astronomical. As a result, the most carefully devised classification system frequently fails to meet the needs of a person who is interested in some peculiar permutation of criteria which runs crossways to the classification system ‘as set up. The frequency with which a classification system proves unable to aid effectively in locating information tends to increase with the number of items encompassed, Greater and greater ramification is needed to cope effectively with the increasing number of items and the classification system is virtually forced to utilize less and less essential characteristics as differentiating criteria. This means that the single (or essentially single) permutation of criteria possible in any practicable classification scheme tends to be based to a greatermand greater degree on less important criteria. 9single permutation of criteria of decreasing importance must tend to become of decreasing general usefulness. These inadequacies of conventional indexing and conventional classification have a practical rather than a theoretical basis. It is not so much the basic philosophy of indexing or of classification that leads to present difficulties, but rather the inadequacies of the tools conventionally used in the past for activating an index or a classification system. We now have available mechanical tools, both simple and complex, which offer promise for escaping the limitations previously imposed on indexing and classifying systems. For example, punched cards are able to register separately and independently a fairly large number of criteria which may characterize any single entity among a group of its fellows. Punched cards, furthermore, permit us to use any desired combination of such criteria in carrying out a search to isolate certain entities characterized by the desired combination of criteria, Owing t o the limited number of holes that may be punched in a given card, considerable ingenuity may be required successfully to cope with the large number of criteria necessary to characterize chemical subject matter. Such mechanical difficulties may perhaps be avoided by using other mechanical or electronic devices. However, regardless of the scope of possibilities inherent in any given mechanical or electrical device, any such device must be regarded as a tool rather than as a thinking machine. The mechanization of a file of information requires that the various items of information be earmarked to permit mechanical or electronic sorting operations; the items must be provided with certain designations before the file of information can be submitted to machine sorting. The question then arises as to the philosophy which should be followed in providing the designations to be used as a basis of machine sorting. Obviously it would be possible to mechanize a conventional index characterized by limitations of the sort discussed. If the mechanized system were to be set up to search for narrow fields of information, it would be best to effect the mechanization of a conventional index, probably in coded form. Such mechanization should effect considerable savings in time now wasted in handling books, taking notes, and the like. I t would be equally possible to develop a scheme for mechanization of conventional classification. However, the mechanization of conventional indexes and conventional classification schemes would fail by a wide margin to extract from mechanization the full benefits which it promises.

4-77

The most efficient use of the new mechanization techniques is impossible without careful study of the relationship of machine operations to the basic concepts on which indexing and classifying arc based. This involves not only intricate problems of nomenclature and semantics, but also such very practical problems as the expense involved in establishing and maintaining mechanized files of information. GENERALIZED INDEXING

It would be theoretically possible t o use either indexing or classification to set up a system in which every generic or subgeneric criterion characterizing each item would be listed and every permutation and combination of such criteria would be included. It would probably be possible t o use either hdexing or classification procedures as a starting point. Suchan idealized system, on the one hand, would partake of the nature of classification in so far as it would be based on criteria which form the basis of conventional classification, and on the other hand would partake of the nature of indexing, in that it would consist of an array of entries of generic, subgeneric, and nongeneric scope. Once a mechanized file had been set up using a properly selected group of criteria, the user would be in position to define his scope of search in terms of independent criteria. He could use fundamental criteria to define such generic and subgeneric index headings as might be convenient or necessary for his immediate needs. In terms of conventional classification, the user of the system might be regarded as combining independent fundamental basic criteria to define special classes and subclasses of immediate interest. A list of the criteria or attributes whose combination could be used t o define a field of interest might be regarded as forming a generalized index. Such an index would include generic, subgeneric, and specific terms capable of characterizing the subject matter under consideration. In drawing up the list of fundamental criteria for any given branch of chemistry, it would be necessary t o select carefully those which would effectively characterize the subject matter under consideration, and particularly t o seek out all which would be necessary t o characterize the subject matter .from different points of view. It might prove advisable to carry on a process of selection to eliminate tendencies to useless redundancy. In contemplating the problem of establishing a generalized index for a given field, it is necessary t o keep in mind the use of such an index by a person not proficient in that field. This may make it necessary to provide the generalized index both in alphabetized form and in a supplementary form in which index entries would be grouped according to various narrower fields t o which they pertain. This would make it possible, for example, for a person relatively unskilled in metallurgy t o consult a section of the generalized index devoted t o ore processing methods. By grouping the various index entries according t o rather broad subdivisions of subject matter a path would be provided along which the novice might proceed t o establish contact with criteria which he could use to define the scope of information of interest to him. Thereafter, the criteria selected would be used as the basis for activating the mechanical devices in searching the mechanized information file. The philosophy bf indexing and classifying requires careful thought for its development. The picture is not simplified by the possibility that punched cards or similar mechanical devices may remove certain serious limitations inherent in conventional methods of indexing and classifying. I n fact, removal of limitations previously imposed on indexing and classifying presents possibilities whose practical and efficient utilization constitutes a problem of exceptional difficulty. RECEIVED August 28, 1947. Presented before the Division of Chemical Education, Symposium on Chemical Literature, a t the 111th Meeting of the A M E R I C A N CHEMICAL SOCIETY, Atlantic City, N. J.