Problems of the Scientific Literature Survey' GUSTAV EGLOFF, MARY ALEXANDER, and PRUDENCE VAN ARSDELL Universal Oil Products Company, Chicago, Illinois
R.
ESEARCHES required for the war effort have himself. For example, in Volume V, page 650, mtens~fieda problem that has been current since 4-isopropyl-stilbene is attributed to Michael ( A m . chemistry became a sciencwthat of preparing a com- Chem. J., 1,312 (1879)) who records i t simply as isoplete background of the literature available before propyl-stilbene. Another example is the compound proceeding with experimental work. The problems assigned the structure of 0-phenylnaphthalene and of a literature survey have not received the attention attributed to Breuer and Zincke (Ber., 11, 1403-7 they deserve. Many chemists meet their first research (1879)) who gave no account of its structure. In rare problem with inadequate preparation for conducting instances, the originals are not available, hence it is literature searches in a reasonable length of time. desirable to give the abstract data. In such cases, it is Some are not even acquainted with the titles of the necessary to specify that the material has been taken more prominent journals in their own language. For from a secondary source so that the data will be conthese people, the attitude has been one of finishing a sidered with mental reservations. minimum amount of library work in order to proceed In order to illustrate these general statements which to the laboratory part of the problem. Much valuable are applicable to all surveys, we will cite various extime has been wasted duplicating studies that have amples found in conducting the literature surveys for already been published. Had a complete search of the the compilation of reliable work on physical constants literature been made, the time lost could have been and chemical reactions of hydrocarbons. These studies more profitably spent in pursuing a new line of re- are a part of the Universal Oil Products Company's search. In addition to avoiding laboratory duplica- program for the exhaustive studies of hydrocarbons. tion, the review of related work may bring forth many The aims are set forth as follows: profitable ideas. 1. To eliminate duplicated effort resulting from In a literature survey of any dimension, a tentative each worker in hydrocarbon research having to consult method of its organization is mapped out. However, several original references in order to identify his many changes and refinements take place as experi- compounds. ence is gained in the studies and collection of material. 2. To correlate the best statistical data into one In laying a foundation for such a research, it is desirable value a t standard conditions of temperature and to confine the preliminaries to the standard sources pressure. such as: Chemical Abstracts ( I ) , Chemisches Zentral3. To make available a complete search for laborablatt (Z), and the British Chemical Abstracts (3). To tory workers who are hampered by lack of adequate assure more thoroughness, Beilstein (4), Richter- library facilities, lack of time, lack of training, and Anschutz (5), Landolt-Bornstein Tabellen (6), and the general disinclination. "International Critical Tables" (7) should be con4. To develop a means of mathematical prediction sulted for organic chemistry and the standard inor- of the theoretical physical constants of missing isoganic reference sources for inorganic chemistry. How- meric hydrocarbons of any homologous series, through ever complete and accurate the secondary sources just the compilation of a complete set of the available conmentioned may seem, only the original articles to which stants for any series. Evaluation of types of subthey refer should be used. This is no adverse re- stitutions on either aliphatic, alicyclic, or aromatic flection on these secondary sources, because for many groups could be made and fundamental differences bepurposes they are excellent and quite sufficient, but due tween these groups could be checked. to the volume of material that must be covered, With these ideas in mind, the work gained consideromissions and errors inadvertently creep into these ah- able momentum, not to mention many complications. stracts. There is also a tendency for an abstractor The first problems encountered in such an undertaking to do some editing of the work in line with the type of were in the location of the material on hydrocarbons, study in which he is interested, to round off figures in since a page by page perusal of all journals was not physical constants, and to draw conclusions not in- practical. A fairly complete coverage of references is tended by the author of the original article. In found collectively in Beilstein up to 1910, Chemisches Beilstein, the editors have taken the liberty of assign- Zentralblatt from 1900 to 1939, and Chemical Abstracts ing definite structures to compounds for which no from 1920 to the present; nevertheless, Beilstein structure was so much as implied by the experimenter Supplements I and V, Chemical Abstracts from 1907 to 1920, and the Landolt-Bornstein Tabellen contained a ' Presented at the 105th meeting of the American Chemical sufficient number of additional references to warrant Society in Detroit, Michigan, April 12, 1943, before the Division coverape. Our method of coverage included a check of Chemrcal Education.
of Beilstein Volumes I and V and Supplements I and V and indexes df Chemisches Zentralblatt from 1900 to 1939, as well as Chemical Abstracts indexes from the beginning through 1941. The 1942 abstkacts were covered in a page by page inspec&on of sections: General and Physical Chemistry; Organic; Fuels and Carbonization Products; and Petroleum, Lubricants, and Asphalts. Landolt-Bornstein was checked by page inspection. References given to related work in r,riginal publications were checked, but thtse ntre oiten hirlllv " , inaccurate both as to their hihliomanhv reference and quotations. Review articles could not be depended upon for completeness; these, however, were always checked for references and data quotations. All these references were organized alphabetically with two sets of cards for each reference. The first set were 6 X 9 in. cards, to contain all the data from the originals, and the second set, containing author's name and reference, were made on 3 X 5 in. cards and were used as a permanent bibliography index. The earlier literature, bearing a publication date prior to 1910, contains much data which was good for that time, but improved instruments for measurement, later knowledge, and better technique of purification discredits somewhat the numerical values reported. These carefully determined earlier figures were valuable in setting up a comparative background for laws, equations, and generalizations, but for our purposes in setting up the best numerical values, they are generally of little consequence, although some of the data are surprisingly accurate. The accuracy of a given physical constant is not necessarily proved because it appears many more times than another value. Generally the repetition occurred as the result of a number of authors performing a routine identification and checking their values by what is present in the literature; consequently these values are placed again in the literature as new determinations. These authors are not to be censured for this procedure since they were probably interested in other problems than determination of physical constants. The most desirable type of physical constants data is not plentiful in the literature. This is to be expected since most constants were not published with the idea of using them in a publication of the nature of these volumes on "Physical Constants." Accuracy is definitely related to the type of research for which the numerical values were determined. Some determinations have been carried out with extreme precision to find the best value of the constant. The data from some physical-chemical research, for which especially purified compounds, carefully calibrated apparatus, and exact determinations are necessary, are of excellent character. In some cases the data given for a newly discovered compound have a high degree'of accuracy. Constants for the purpose of identification are often worth recording, but much of this work is Door. > . due to the fact that the exnerimenter made one or two derivatives of the compound, and if these "
A
,
checked within a few degrees of the "accepted physical constant values the identification was considered sufficient. When the constant is given as a criterion of purity it is seldom good enough to be recorded. The constants published on narrow boiling fractions of petroleum are rarely of any value and are automatically excluded. Many problems arise because of the author's lack of clarity on certain points. Often it is questionable whether the constant is of the compound itself or of a derivative. In the following statement i t is implied that 116' is the melting point of the derivativewhich it is-but it is literally stated that the melting point is of the hydrocarbon: "From the fraction 235-245' picric acid precipitated a picrate of a deeper yellow color; after repeated crystallization from alcohol i t melted a t 116", the correct melting point for a-methylnaphthalerie" (8). When a compound of definite structure, such as 1,2-dimethylnaphthalene is discussed, and after a l o n ~ discourse on preparation a melting point may be given for dimethvlnanhthalene. the reader is a t a loss to know whether thk c k s t a n t is oi a compound identified as the 12-structure or merely a dimethyl compound. A paragraph may be headed with the name of some compound, as "diphenylfulvene" in the following, but on reading the paragraph one is confused as to whether the title is a starting material or product. "Di9henylfuluene. In order to compare the properties of our dicyclic fulvenes with those of the open chained analogs, we added maleic anhydride to diphenylfulvene. Our product, like that obtained by Diels and Alder, melted a t 168'. Its solutions in ethyl acetate, in glacial acetic acid we found a molecular weight of 180 instead of 328 calculated for the uudissociated compound. In benzene the molecular weight was, a t the outset, almost normal-305 instead of 328-but it fell off with time and after ten days was found to be 183" (9). Authors frequently use two or more systems of nomenclature quite indiscriminately. Much confusion as to terms is found in the literature and proves to be misleading unless one is particularly well acquainted with the subject a t hand. For example, on one page the name a#-di-(9-phenauthryl-butane appears in the st paragraph and 1,4-di-(9-phenanthry1)-butane in the last paragraph (10). Even though the reader may recognize these as the same, his trend of thought is interrupted since he must pause to verify his conclusion. In the next example, the use of two names for one compound in the same paragraph is quite inexcusable. Such statements as the following imply two different compounds, and unless the reader is well acquainted with the structure of dehydrocadalene, he probably continues reading with an erroneous conception of the facts. "On cooling, the picrate of 1,6dimethyl-4-isopropenylnaphthalene (11) separated as orange-red needles . dehydrocadalene (11) isolated from the ~icrate.formed . . ." (11). ~, Refractive indices are reported by some authors as of ~~~
..
~~
the lines C, F, and G while other experimenters denote the same lines as Ha, He, H,. Such variation is confusing and too much of i t eventually causes errors. In certain instances, it is not clear whether the solidliquid transition is a melting point or freezing point; although these are theoretically the same, they vary under ordinary experimental conditions. In tables such as "Physical Constants" which sewe as reference for comparison, a discrimination between the two is essential. Accurate judgment of data was complicated by many factors. The reader has no way of knowing how meticulously the person in the laboratory performed his experiments. It may he assumed that a carefully detailed and . comprehensive publication represents good laboratory work. Unfortunately such an assumption is not wholly true, for there are those who have mastered the art of writing well enough to cast aside any question of laboratory sloppiness. The personal integrity of the authors also might be such that poor results were "doctored" to insure a uniform report. Even though the senior author in charge may be very reputable, he cannot be held totally responsible for the student who deaded to neglect taking a melting point and merely recorded a literature value. In the case of densities, data which would seem fairly good, from the care used in determination, are spoiled because the author fails to state whether the figure is given with respect to water a t 4' or water a t the same temperature as the determination. The value of densities of compounds which might exist either as supercooled liquids or solids a t the designated temperature is diminished because the state of aggregation was not given. Good index of refraction and density dath are often wortbless because the temperature was omitted. In the case of the indices, the spectrum line is not always given, and while it usually refers to the sodium D lines, such an assumption is not wholly safe. In cases of omission of temperature and spectrum lines, the data were not used since the omission implies carelessness and the assignment of temperature and spectrum lines extends beyond the scope of discriminatory judgment. In judging the purity of compounds discrimination was difficult, for too many of the dependent factors were omitted from the report. The number of recrystallizations for melting point data, the number of fractional distillations, and the amount of drying of liquids were taken into consideration. Carbon-hydrogen analysis was another helpful criterion usually included in publications. The precision of the instnments used was given in too few instances. The constants themselves are often a measure of precision. That is, a two-degree melting range indicates a less pure compound than a one-degree range, and a constant boiling liquid-except in the case of a constant boiling m i x t u r e i s more nearly pure than one having a boiling range of several degrees. Boiling point data were gathered for all pressures given on each compound and the values for dt/dp were
calculated whenever the data permitted, so that an extrapolation of data obtained in the laboratory could be made. high degree of accuracy in The maintenance of physical constants was complicated by a few seemingly insignificant details. Many references contained a t least one of the following errors: misspelling the author's name, an incorrect abbreviation for the journal, the wrong page number, or the wrong year. The misspelling caused many duplications and in the case of names which are spelled differently in two languages, as a Russian name in English and Germanfor example, Philipov and Filipowinvolves inconsistency in the bibliography and actual duplication of the same constant on the printed page. The wrong journal abbreviations and page numbers not only involve duplication but also the antithesis, omission, since the correct reference is not easily located. The year is one part of the reference which can very easily slip by unnoticed, as i t is not always necessary for location. A further difficulty encountered, particularly in the search for the alicyclic and aromatic groups of hydrocarbon compounds, was the lack of an adequate glossary defining and giving structures of compounds such as pinene, bornylene, norcamphene, eudalene, azulene,
cinnamene, tricyclene prehnitene, trindene, periflanthene, mariene, and numerous others. Even after the pnblication of the "Ring Index," by Patterson and Cappell, many of these hydrocarbons were still undefined and no structures were extant from an authoritative source. Our own group had to he constantly reminded of the types of mistakes that could be and were made in the recording of data. We found that there were certain types of errors, such as transposition or omission of numbers, and omission of letters in spelling, that were common to everyone who ever worked on the project. In addition, all new persons had to be closely watched during the first several weeks of their work in the library, in order to 6nd out what type of mistakes were common to them, and after this checking period they were constantly reminded to avoid these errors. Among these were the omission of temperatures and pressures a t which the constants were given, omitting spectrum line designated for an index of refraction, not labeling the constant for density or index of refraction. Usually it is easy to decide whether the value is for a density or index of refraction, but less ambiguity and confusion result if the constant is labeled. Confusion resulted from the omission of parts of journal names, such as Annalen, Annalen der Physik, or Annales de chimie. Other omissions such as leaving out one or both initials of an author's name were constantly watched for. Since each person had his own variety of mistakes, a periodic exchange of data sheets between members of the group was found conducive to accuracy and clarity in the recording of constants. In order to fulfill our purpose, to collect and record all reasonably accurate melting points, boiling points,
densities, and refractive indices, i t was necessary to cover the original publications in a systematic fashion. This was done by first placing all the data cards in alphabetical order, then each letter in turn was organized according to the journals covered. For example, the authors of a particular alphabet group appearing in the original journals placed in the volume order of the journal and before leaving this set of journals every author listed in a particular letter was covered. In this way the journals were freed, so that another worker could begin with another letter. The data from each article were recorded on a separate page headed by the author's name and initials. The complete journal reference and the pages of the beginning as well as the end of the article were also included. The nauLeof the con.ponnd appeared in the left hand column, and all data were tabulated in a prearranged order. The data were then given an accuracy rating based on a scale of five to one, five being the highest rating and one the lowest. Because care in handling detail was the only means of avoiding numerous errors, the following rules of procedure have been thought worth while, in the interest of conservation of time both for the director of the problem and the research group engaged in its effective pursuit. Thus each person engaged in this work followed the same procedure and any newcomers could pursue an already organized course. Many of the rules which follow may seem rather superfluous, obvious, and ridiculous, but sad experience while working with a group of graduate chemists, even Ph.D's, shows that the rules are exasperatingly necessary. A number of these rules are recommended as a guide for literature surveys. 1. Write legibly or print in ink. It is particularly important that all numerals and letters be perfectly clear. 2. Initial and date every sheet. In case there was a question on the material, i t might be referred to the proper person. This practice also discouraged carelessness. If it were found necessary to change any rules of procedure, only the data cards completed before a certain date would have to be covered again. 3. Write author's name and initials, journal, volume number, pages of beginning and end of article, and year on first line of each data sheet-. e., across the top of the sheet lengthwise. Include all of the author's initials. The use of this general form avoided confusion and omissions of any of the above items, all of which are necessary for a complete reference. 4. If article cannot be checked by going to the original, give the secondary source-e. g., C. A . When material is taken from a secondary source i t is not correct to attribute the data to the original since the secondary source may be incorplete or incorrect, or both. 5. Do not include patent data. This type of data has been found, in general, not of the accuracy necessary for the study of physical constants.
6. Do not attempt to check Russian or Japanese articles in the original unless familiar with the languages. Although it was possible to find data, such as d6 - 0.897, near a recognizable chemical formula, it was unsafe to record this, for the author may state in the text some grounds for eliminating the constant. 7. Photostat all extensive tabulations of data (onehalf page or more). This practice not only saved time but also eliminated error due to brain lag 'm copying large tables. 8. Keep a working notebook for cross references. The only means of assuring complete coverage of cross references was to record them when encountered and, as insurance against loss, a notebook was found better for this purpose than loose pages. 9. If the article is one of a series by the author, check previous articles of the series. This check is further insurance of complete coverage. 10. Do not copy the author's obvious mistakes, but use this rule with caution. If a density was labeled as a refractive index or any other mistake equally recognizable was made, i t was not recorded. In most cases, however, i t was found more satisfactory to record the material and edit later. 11. Do not attempt to edit on the data sheets. Our purpose in collecting material was to record what appeared in the literature, and thus i t was more efficient to leave all editing until later, when a wellrounded picture of the various situations could be formed.
A.
Record names of compounds just as the author gives them. B. Do not write down a structure for the compound unless i t is given in the article. There was often a disagreement about structure and i t was important to know the author's view. C. Do not take data from tables unless versed by reading the content of the article. The errors of copying quoted data and rounded-off figures were eliminated. A table, on first appearance, often seems to be original, hut after reading the article one finds the data to be quoted or given in a more complete form. D. When a hydrocarbon is formed by the dehydration of an alcohol, do not assume that a pure compound is formed. The anticipated compound is not always formed in chemical reactions, and even though it is, i t may he contaminated with products of side reactions. E. If densities and refractive indices are given without specifying the temperature, do not assume this to be 20°. Temperatures are a t times omitted. Although densities and indices are usually reported a t 20°, this practice is not always followed. F. Do not assume that our conventions for "bi" and "di" are followed in the literature. The prefix "bi" is systematically used by us to designate two rings directly joined by a single
bond, such as biphenyl, and "di" to indicate two substitutions on a third group as in 1,Pdiphenylethane. This point was necessary to put the readers on guard for the many different types of nomenclature in the literature. Record all significant figures as the author gives them. A figure such as 80 does not represent the same degree of precision as the figure 80.0 nor is 1.4950 identical with 1.495. If any comments are made on the data sheet, put them down so that they will not he misinterpreted as comments of the author. Bitter experience was encountered with one person who wrote in his own ideas in such a manner that they appeared to he those of the author's.
in the "additional data column" data such as specific rotation, sublimation temperatures, transition points in liquid or solid, etc. These constants appear too infrequently to warrant a separate column in the tabulation. 21. Write structures whenever author gives them, except for benze~e, naphthalene, anthracene, pheG. nanthrene, fluorene, and very sin;ple derivatjves of these. I t has been noted that surprisingly few names are represented by the sae structural formula in the H. literature by all experin enters. It would have been a waste of time, however, to record formulas for the above compounds. 22. If any data are recorded other than direct experimental data, note whether data are extrapolated or calculated. Our t~bulations are strictly of experimental work and calculated values are of interest only from a comparative standpoint. 12. When recording the formula, name, and struc23. If any question of procedure arises not covered ture of a compound on the data sheet, check to see if by these rules, bring it up for discussion. Any comthese are consistent. If not consistent, reread the plications arising should be made known to everyone article in an attempt to discover the author's error or working on the physical constants project and it is well your own. At times misprints occur or the author is a to have several viewpoints on any subject. 24. In checking another researcher's work, do not little careless, and in such cases the error may be sufficorrect apparent errors of judgrr.ent without consulting cient to make the data worthless. 13. Note whether the densities and refractive in- that person. Minor errors are to be corrected with colored pencil. Checking was done =ore rapidly than dices are for liquid or solid. 14. Put down the author's numbering of ring sys- initial coverage and consequently the apparent was not tems. There is considerable variation in numbering always correct when viewed with closer scrutiny. a t times, and although a compound is assigned a 25. Make an estimate of the error of each constant structure, the numbers may not be in accord with the on a scale of five to one. The basis for rating the system generally used. accuracy of each constant is found in the original pub15. Put down the probable errors and estimated lication, since secondary considerations, such as the uncertainties of all data as the author gives them. standing of the senior author, the agreement of the conNote whether the probable error is quantitative or stant with others, and the date of the work, are the qualitativm'. e., actually calculated on the basis of only points by which the work may be evaluated from the data sheet. data or merely an estimated order of magnitude. 26. Record all data for deuterium hydrocarbons. 16. Put down the individual values of the physical constants and not merely the average. If the author The deuterium compounds have somewhat different gives the average, record this too, and clearly state. properties from the corresponding light hydrogen hyThis rule emphasizes the principle of recording experi- drocarbons and are necessary to complete the survey mental data. In many instances the author records of hydrocarbon physical constants. several values without indicating which is best. After the review of these rules for the mechanics of a 17. Do not put down boiling point of compound if literature survey, the conclusion or question that might boiling point range is greater than 5'. If boiling be in anyone's mind is, "Why are they necessary for a point is rejected, also reject accompanying density and group of trained scientists?" index of refraction data. Densities and indices of The answer lies in the lack of fundamental training refraction are determined on this same boiling fraction, in the use of the literature for a bachelor's degree, as consequently they are no better than the boiling range well as inability to foresee the outcome of carelessness value itself. in handling certain details, and of deviation from a 18. Do not record melting points of range greater general form. There are too few colleges and univerthan 2' and accompanying data. A melting range sities that have made such training a compulsory part shows impurities. This rule has been violated in a few of their undergraduate chemistry curriculum. For a cases, such as the penta-aryl-ethanes, which decom- number of years there has been a tendency in this direcpose near their melting points. tion, since a knowledge of French or German is en19. With the exceptions contained in Rules 17 and couraged and advised before a bachelor's degree in 18, record all boiling point, melting point, density, and chemistry is granted. However, there must be more index of refraction data under all conditions of pressure, emphasis placed upon the importance of the mechanics temperature, wave lengths, etc. of scientific literature research. In view of the large number of Russian scientific and 20. In addition to data on the four constants, put
technical publications, it becomes increasingly desirable for the researcher to cover the original Russian literature rather than to depend upon secondary sources. The need for courses in Russian in the college curriculum is far greater today than ever before. A suggestion for alleviation of the problem of literature surveys for the graduate scientist would be the preparation of an undergraduate thesis on bibliographic research, and if the research was too extensive and involved a master's degree could be granted for its successful conclusion. Perhaps even a Ph.D. thesis could be submitted correlating the findings of a bibliographic study. Even the well-trained scientist may bungle his first &arts in literature research. He will eventually develop his own methods, but if in the examination of the literature he finds publication of the accumulated experiences of his predecessors in the field of surveying, the development will be much more rapid. At the close of this war there will be an even greater necessity for such training, since there will undoubtedly be released for publication a tremendous volume of research which is now undei secrecy orders because of its relation to the war efforts. When this occurs there must be research scientists who are trained and ready to handle it. The work we have described here covered more than 25,000 original references and its completion required an aggregate of sixteen people for
over five years. But this is a mere "drop in the bucket" compared to what will be necessary in the days to come for the compilation and summary of material in other lines of research. If the efforts of specially trained scientists in industrial libraries are required to give fundamental training in this type of research, much time will be wasted that could well have been put to other uses, had the incoming group been trained in the mechanics' of scientific literature surveys during their undergraduate training. LITERATURE CITED
(1) Chemical Abstracts, The American Chemical Society. Ohio
State College, Columbus, Ohio.
(2) C@misches ZentralblaLt, Verlag Chemie G. m. h. H., Berlm.
British Chemical Aktrads, London Chem. Society, Central House. London. B@lslejnsHandbuch der Organischm Chemie, Julius Springer, Berlin.
Richter-Anschutz, Chemie der Kohlenstoff Vcrbindunger oder Organirche Chemie, Akademische Verlag Gesellschaft
m. b. H.. Leipzig.
Landolt-Bdmstein, Physikalisch-Chenische .Tabellen, Julius
Springer, Berlin. "International Critical Tables," McGraw-Hill Book Co.. Inc., New York. EASTERRIELD AND MCCLELLAND, 3. SOC.Chem. Ind., 42, m au s , roo.,, "Y" \.ruu,.
KOHLER AND KABLE, 3. Am. Chem. Soc., 57, 917-8 (1935). BERGMANN, 3. Am. Chem. Soc., 58,1678-81 (1932). BARNETT AND COOK,3. Chem. Soc., 1933,224.