P. T. Hinde
Boston University Boston, Massachusetts
Using Edge-Notched Cards for Personal Interest Literature Files
I n recent years many papers have appeared dealing with the use of edge-notched cards for filing scientific references.' Some of these articles deal with fields where logical and complete classification schemes can he devised (e.g., I R spectra, coding of organic compounds by structure and forn~ula). Others have been concerned with specific research projects, or with the problems involved in setting up a large central file for an industrial laboratory. Relatively few papers have dealt with the problems involved in setting up a personal file to contain references relating to the owner's research interests, and most of these have confined themselves to describing a coding system suited only to the author's specialized interests. This paper attempts to discuss in general terms the problems involved in setting up an edge-notched card file for personal use, rather than to give details of a specific system. Our own system is used as a basis for discussion, but details of the subject coding (covering the main fields of gas kinetics and free-radical reactions) are mentioned only briefly for the purposes of illustration. In setting up our system we endeavoured to keep the coding sufficiently flexible so that future changes in research interests could be accommodated without any major revision being necessary. This has been accomplished by coding both very general suhjects and very specific subject-elements. This approach has something in common with the Uniterm or Keyword method of indexing developed by Mortimer Tauhe (1). Numerical coding is used only to a limited extent, most information being direct coded. Choice of Cards
Cards can be readily obtained in sizes from 4 X 6 inches up, punched either with a single row of holes all around, or with two rows of holes along one or more of the edges. For the present application the choice lies between 4 X 6 and 5 X 8 inches; larger sizes are unnecessarily bulky, expensive, and clumsy to sort, and generally have far more holes than are needed. In our system 4- X 6-in. single-row cards are used which have a total of 70 usable ho1es.l Most 5- X 8-in. cards have about 96 holes. For personal files, which cover a limited range of suhjects, we think the smaller singlerow cards have sufficient holes for adequate coding. For comprehensive bibliographies see CASEY,R. S. ET AL., "Punched Cards," 2nd ed., Reinhold Publishing Carp., New York, 1958, and REICHMAN, F. in "The State of the Lihrary Art," R. R. Shaw, editor, Rutgers University Press, New Brunswick, New Jersey, 1961, T'ol. 4, Part 1. Both these hooks give lists of card and equipment supplien. From the Copeland-Chatterson Company, Stroud, Gloucestershire, England.
The advantages and disadvantages of double-row cards will he discussed later in conjunction with the related problem of direct versus numerical coding. Cards which have a slightly condensed alphabet along the top edge, with the other holes serially numbered, are most satisfactory. The cost of custom-printed cards is generally prohibitive for personal files and suffers from the major disadvantage that details of the coding are seldom settled when the cards are ordered. Basic Principles of Coding
In direct coding (or yes-no coding) each hole is assigned a specific meaning or subject. If the hole is notched, then the card in question contains information on that suhject. The only serious drawback to direct coding is that only one subject can be coded per hole. If it is desired to code more subjects than there are holes available, some form of numerical coding must he used. Various systems have been described (8) but the simplest, which is more than adequate for a small file, is that known as 7-4-2-1 coding. A four-hole field has values of 7, 4, 2 and 1 assigned to the four holes. By notching one or more holes, any me of up to 14 suhjects can he coded in the space of four holes. I t is often convenient to limit the number of suhjects to nine or ten-up to nine subjects can be coded by notching only two holes. By including a fifth hole in the field, designated "N-Z," any one letter of the alphabet can he coded in the space of five holes, the N-Z hole being notched for the latter half of the alphabet only. The disadvantage of numerical coding of subjects is that, in general, only one subject can be coded in a particular field. With the continual merging of once separate fields of chemistry it is often difficult to prepare a set of mutually exclusive subjects for such a field. At the outset it may appear that such a list has been prepared, hut a t some future date references will occur which contain information on two or more suhjects in the same field, only one of which can he coded. Thus in one of our numerical codes, which gives details of the analytical techniques employed, the subjects "chromatography" and 'Lspectroscopy" are included. Recently there has been great interest in the coupling of a gas chromatography unit with a fastscanning I R spectrometer in order to identify the separated components. Under which heading is mention of such an apparatus to be coded? Experience with our system has convinced us that, for small personal files, as little nun~ericalsuhject coding as possible should be employed. The facility with which multiple cross-referencing can he achieved with notched cards frequently leads to the devising of overdetailed subject codes. This must be avoided a t all costs. It should
be remembered that it takes less time to hand-sort 20 cards that have dropped out after sorting for a broad subject, than it does to needle-sort them three times to extract the specific subject from a four hole numerical field. Although the purpose of detailed subject codes is presumably to collect all papers on a specific subject in one place, this often fails in practice since as subject divisions become increasingly specific it becomes increasingly difficult to assign a paper to one divisio~l.~ This leads to unnecessary duplication in coding (the "see also" references of a library catalog). Since a pack of notched cards can be successively "sorteddown" through several subjects, the pack can usually be reduced to a small number of relevant cards without the necessity of coding specific,inflexible subjects. Another disadvantage of over-detailed numerical coding is that reference must constantly be made to subject lists, both when notching and when sorting. This is time consuming and increases the chance of errors in notching. If, for reasons of space, extensive use of numerical coding is necessary, theu some form of superimposed coding (.9,4) may offer a better solution. In these schemes a list of subjects is prepared, which are then assigned random numbers. These numbers are the11 coded in one large field of 20 or more holes. Statistical techniques have shown that, provided the numbers were assigned randomly, two or more subjects can be coded in one field without producing an unduly large number of false drop-outs when t,he cards are sorted. Problems similar to those encountered in numerical coding will also occur if two-row cards are used. Thus if in a "REACTANT" code on a two-row card an out,er and an inner hole were designated "olefins" and "amines," then a reactiori in which only an amine is present cannot be distinguished from one in which both an amine and an olefin are present. As in the case of numerical coding, when the code is devised it may appear that mutually exclusive pairs of reactants have been chosen (i.e., it is assumed that an olefin and amiue will not be present together) but scientific progress, or a change in research interest may invalidat,e this premise a t soine future time. Both numerical coding and multiple-row cards are most useful for the fling of numerical data on a compound or reaction. Thus in three 4-hole numerical fields the molecular weight of a polymer can be given to +10 units for a weight up to 10,000. Whether a molecule has one ring or two, or one double bond or two, are examples of the type of data which can he coded on a double row card without any possibility of ambiguity, and with considerable saving of space.
thors and subjects the following information is coded on our cards (all except item (4), date of publication, are direct coded) : (1) Type of paper (a) Theoretical (b) Experimental discussion (c) Practical technique (d) Technical, instrumentation ( e ) General science, mix. topics (2) Type of chemistry (a) Andyt,icd (h) Kinetics, physical organio (c) Noelear and radiation chem., ionic reactions (d) General physical chemistry, structure determination (el Organic and inorganic preparations, purification of reactants (3) Type uf publication ( a ) Book (h) Journal article ( e ) Review (d) Data ( e ) Limited circulation reports (4) Date of publication. A five hole field is assigned the values:
+
Hole Years Hole Years
In conventional card files, authors aud subjects are usually the only parameters under which a reference is classified. In a notched card file a number of other useful items can be coded without haviug to prepare duplicate cards or do other additional work. Apart from au-
+
+
+
+
By notching three holes, dates up to 1995 can be covered in five year increments. (5) Miscellaneous (a) Reprint held (b) Teaching interest (c) Idea suggested by reading paper 0 t h n ~ i ~ v ~ l f l~ c ~t cnm~which o ~ lnve ~ ~ h e m SUKKP.IWI c . qt1i1ablc f*,r v ~ h :,re n ~n 3 n m ul j ~ ~ m a l sl . ,~ n g w g eo f original, l.~..%
or
ftwgrl
in~tr.nr,ori6in:rl
111t1 SCCII
o c . , .th,tr:lvt 01 11" ihitract).
All cards are serially numbered and a record kept of the cards used each year. This makes it possible to determine the approximate date on which a card was prepared. Originally, three holes were used to indicate the number of authors (one, two, more than two) but this was found to be little used as an aid to retrieval and the holes were allocated to direct coded subjects.
b~ OC
What to Code
3 1950 10 ('56'60)
1 2 Before 1950 5 1950 ('51-'55) 4 5 1950 15 1950 20 ('61-'65) ('66'70)
ondstn
O.,.
I
*mn w 1-
0,.
0,,
t i l t -
. e i m ~ e s~ .~
r r a u l t l C l l O n i t o ~ .+h1101~~Un ~mP-l,
ao
-i........l
....,....."
"i...
. ..- . . , . : i,....
~
r4
. . .,
M
~ % ~ i ~ ~ n ~ r m c r l i m + xeawuczea). .(t'
.U
0
690
a* 0
P~~o.~ia
" 0 r 0
Specimen cord after notching. The nokher hove the following meonings Hole 5 -Experimental dircvrrion '' 10 -Kinetics a n d physical organic chemistry Holes 15, 17-Oxygenoted organic compounds Hole 19 -0 i!dotlon ' . of organis compounds " 22 -Work concerned with H0s or OH mdicolr Hder 31. 33-Reactions done in a Row system Hole 37 -Published between 1961 ond 1965 Hole 43 -Journal article Hole 0' -Reprint on flle (also denoted by R in top left)
+
'The danger in selecting very specificsubjects is well illustrated by a note in the introduction t o a recent edition of Dewey's Decimal Classification, to the effect that certain subjects and their classificrttion numbers have been omitted since a. search of the Lihrary of Congress revealed that no hooks were classified under these subjects.
Volume 42, Number 10, October 1965
/
567
Author Code
We have used a direct code for the first letter of the author's name. I n a few cases two or more uncommon letters are assigned to one hole. All the authors of a paper are normally coded. Experience has suggested that better retrieval would be achieved in the same space if two or three letters were assigned to each hole. The holes remaining would then be similarly coded to indicate the second letter of the senior author's name. A useful list of the frequency of initial letters of surnames found in the chemical literature has been given by Cox, Casey, and Bailey (5). If it seems that an excessive amount of space is used by direct coding of authors, then numerical coding may be used. One complication which then arises is the coding of multiple author papers. A convenient arrangement is to provide three fields. For single authors the first three letters of the name are coded. For two or more authors the first letters of the surnames of the first two and the last author are coded. A 7-4-2-1 code plus an N-Z hole may be used, or the mnemonic code devised by Casey, Bailey, and Cox (6). I n this latter code a five hole field is assigned the values 0 I E C B ; the necessary punching combinations are more easily memorizable than when numerals are used to designate the holes. Subject Coding
The subject coding in our system consists of three 4hole numerical fields and 12 direct coded subjects. The "Type of Chemistry" code referred to earlier may also be considered to be a very general subject code. A number of vacant holes remain on the cards, which will be used to accommodate additional direct coded suhjects. The numerical fields are designated "Principal Reactant" (A), "Type of Reaction" ( B ) ,and "Experimental Factors" (C). I n code A the limitation of one entry per field seldom causes problems. Code B is the least satisfactory. The classification of a radical reaction depends very largely on the aspect which is of interest (the fact that an inorganic oxidation reaction also involves a reduction illustrates the problem). Consequently, most of the direct coded subjects are descriptors for "Type of Reaction" in order to supplement this code. I n code C are entered such factors as use of high or low pressure, flow system reactions, and use of mass spectrometry, chromatography, or conventional spectroscopy for analysis. Code C is not of such fundamental importance as the other codes and is seldom used for primary searching. Most often it is used to reduce the number of cards after an author or subject sort, or as a rejection code; e.g., in a pack of 50 cards on hydrocarbon oxidation (a direct coded subject) we are not interested in flow system studies hut are interested in static studies using mass spectrometric analysis. Sortmg for "mass spectrometry" might not give us all relevant cards since some mass spectrometric work might have relied more on chromatography for analysis, and perhaps was so coded. But if "flow systems" is sorted, a number of cards will drop out which can be rejected with the certainty that they are irrelevant. Another possible way of successfully coding non-exclusive experimental parameters would be to establish an order of precedence for the subjects. Thus it might be decided that this order be established: 568
/
Journol of Chemical Educafion
mass spectrometry, high temperature work, high pressure work, flow system studies. Then any paper dealing with mass spectrometry would be so coded, any paper dealing with high temperature work would be so coded unless mass spectrometry was also used, and so on. Devising Subject Codes
The first essential point to keep in mind when setting up a notched card file is that it requires a quite different approach to subject classification than does a normal card file. Many of the principles of library classification are no longer valid. Even if a large conventional card file exists, no attempt should be made to use the same subject groupings, for to do so will nullify many of the advantages of notched cards. Another essential rule is that, irrespective of whether the notched card system is being built up from a conventional card file, no punching should be done until several hundred cards have accumulated. A partial exception may be made in the case of author codes, date codes, and other nonsubject oriented information, but even in the case of these factors temporary postponement of punching is best. After a tentative subject code has been worked out, the amount of space available for author and miscellaneous codes can be better estimated. The relative importance of subject, author, and ancillary codes depends very much on the user's work. Thus for someone engaged in patent work, the date of publication (or of issuance of a patent) may be of vital importance. To someone interested in chemotherapy, an author code may be of little value since many of the cards will contain data on a specific compound (from various sources) rather than reference to a particular paper. Two specific points which should be borne in mind as guiding principles are: (a) Decide which aspects of a subject are of most interest and select subjects for coding on the basis of their retrieval possibilities rather than the ease with which they fit conventional titles of papers. (b) Two types of subjects should be codedvery broad subjects and very specific subject elements or key-words. It is easier to file and retrieve papers using several broad subjects than to try and decide which specific subject a paper fits into best. The broad subjects will never become outdated, while the specific subject-elements, which are normally sorted for only in combination with broad subjects, can in the future be combined in ways which were not thought of a t the time of setting up the file. Both the above points conflict with conventional classification and indexing practice. I n most nonpunched card files for personal use the cards are divided into perhaps a dozen or more broad subjects, and then filed alphabetically by author. The subjects are so chosen that most papers will fit into only one of them, thus reducing the need for duplicate cards as cross-references. Since notched cards can he crossreferenced so easily it is unnecessary to select subjects so that papers will "fit" into one of them. Better retrieval and better utilization of the characteristics of notched cards will be attained when most papers are coded under two or three general subjects. It must he remembered that notched cards can be searched for a conlhination of subjects or subject-elements. Also,
they can be searched consecutively, moving from a general to a more specific subject. Thus a general subject like "thermal decomposition" can be usefully coded. The large number of cards retrieved when this subject is sorted for can be further sorted under more specific topics. Since specific topics are inflexible and wasteful of space, further coding of the subject of a paper is best done as a series of fragments or subjectelements. Thus in our system a paper on "The Thermal Decomposition and Oxidation of Olefins" can be coded under "thermal decompositions," "olefins," and "hydrocarbon oxidation." Likewise a paper describing instrumental improvements in the gas chromatographic analysis of amines would be coded under "cbronlatography," "nou-cyclic nitrogen compounds," and "technical and instrumentation." This approach of breaking up a subject into keywords or subject-elements is the direct antithesis of the Chemical Abst~acts maxim "index subjects, not words," as applied to the preparation of a conventional index. But one of the main distinctions between a conventional index and a notched card index is that the former can code (i.e., list) as many specific subjects as desired but cannot be selectively searched from a general to a specific subject, or for a combination of keywords. A notched-card index, on the other hand, can accommodate only a limited number of subjects but these can be searched consecutively or in combination. When the system is being set up, the choice of subject-elements to be coded is best made by examining the cards already prepared, and considering current research interest. If true subject-elements rather than subjects are selected, they will often be found later to be applicable to the coding of other papers on related subjects. Thus an interest in the acetone photosensitized decomposition of hydrocarbons might result in the subject-elements "chemical sensitization" and "W radiation," with "hydrocarbon decomposition" as a general subject. Likewise, an interest in the mass spectrometric study of the products of high temperature hydrocarbon decomposition would have resulted in the subject-elements "mass spectrometry" and "high temperature," with the same general subject of hydrocarbon decomposition. If a t some future time a paper describing a mass spectrometric study of the products of acetone photosensitized decompositions becomes of interest, it can be adequately coded using existing subject-elements. Likewise a paper on high temperature decon~positionof hydrocarbon8 initiated by vacuum ultraviolet radiation can be coded without having to expand the coding. A particular warning should be given against trying to base a notched card system on any existing classification scheme (6). To try to apply a scheme such as the Universal Decimal Classification to a personal notched
card file will be doubly disastrous. First, the use of such a comprehensive scheme for a personal file will rcsult in large blocks of unused numbers. Second, for the reasons given earlier in this section, a classification scheme for a conventional file is far from ideal for use with a notched card file, since the coding requires a quite different approach. A similar warning should be given as to the inadvisability of trying to use preprinted cards, sold as suitable for chemical use. These cards are invariably devised for use in a large central file in an industrial laboratory. Use of such cards will result in many unused positions, and in subjects being forced into the most appropriate printed hole, rather than the coded subjects being chosen by the user to suit his particular research interests. Conclusions
Setting up a notched card file requires a different approach to subject selection for coding than does a conventional card file or printed index. Coding should include subjects so broad as to be virtually useless in a conventional file, and specific subject-elements which may be meaningful only in combination. The library classifiers' maxim of "code the specific not the general" need not apply, since consecutive sorting of general subjects is possible with notched cards. Similarly, the C A maxim, "index subjects, not words," can be ignored, since the words (subject-elements) can be searched for in conjunction with their generalsubjects. Numerical subject coding should be used as little as possible. I n a system which is continually expanding with regard to range of subjects covered, it is difficult to select a set of mutually exclusive subjects for a code field. If reasons of space make numerical subject coding essential, then random superimposed coding may offer a better solution to the problem. Coding should be kept simple-even with the most elementary coding, a notched card file offers vastly improved retrieval possibilities over a conventional file. Finally, do not begin notching until 200 or more cards are available as a basis for setting up a subject code, and leave adequate space on the card for adding more direct coded subjects or subject-elements. Literature Cited
TAUBE,M., i n "The Technical Report" (Weil, B. H., editor), Reinhold Publishing Corp., New York, 1954. CASEY,R. S., ET AL., "Punched Cards," 2nd ed., Reinhold Publishing Corp., New York, 1958. Ibid., Chap. 21 and 23. SOPER, A. K., AslibProc. 7 , No. 4,251 (1955). C. F., J. CHEM. EDUC. Cox, G. J., CASEY,R. S., AND BAILEY, 24,65 (1947). C. F.,AND COX,G. J., J. CHEM.EDUC. CASEY,R. S., BAILEY, 23,495 (1946).
". . . Agreement of experiment with themy does not 'p~ove'that the theory i s 'true'; i t fu~nisheaonly some evidence that it may be adequate for dealing with a m t k e ~case o j the same kind. The probability that it i s adequate grmus with the number ofsueeessjul cases of agreemnt . . . A law q f s e i a c e is not a piece o j legislation that phenomena are bound under penalty lo obey. Thew is n o penalty far being 'abnormal."' JOELH. HILDEBRAND -From the Priestleg M e m m a l Award Address
Volume 42, Number 7 0, Ocfober 1965
/
569