Procedures for Assessing Errors. - Journal of Chemical Information

Related Content: Control and Elimination of Errors in ISI Services. Journal of Chemical Documentation. Sher, Garfield, and Elias. 1966 6 (3), pp 132â€...
1 downloads 0 Views 426KB Size
PROCEDURES FOR ASSESSING ERROR5 Rule 7. Watch out for the following frequent errors: a. 0 for C and \ice versa in the first member of a side chain. b. Cyclopentyl or cyclohexyl rings with just a vertical double bond implying a methylene group where C = O is meant. c. Careless usage of the solid, dotted, and wavy lines to show known or unknown stereo bonds. Once a paper uses a dotted or wavy line, one would expect that the solid lines mean something, but this is not always the case. d. Beware of the 5-position in steroids. The hydrogen a t this position can be either a(dotted) or @(solid).Authors often fail to indicate which it is, even where this can be determined from flow sheet or context. e. Trade and nonproprietary trivial names often imply more than they say. Where an author says Demerol he means meperidine hydrochloride. Biochemical literature is loaded with articles referring to epinephrine where epinephrine bitartrate is meant. Recent issues of CBAC have made a false distinction between S-isopropylaminol-(2-naphthyl)-ethanol, Registry No. 54808, and its hydrochloride, KO.51025, under the false assumption that the trade name Nethalide refers to the base. I t is the hydrochloride. f. I t is easy to confuse pyrrolidino with pyrrolidono. g. I n phenothiazines, the U. S. 3-substituent would often be called a 2-substituent in European literature. This confusing designation will be carried over to CA if the name appears in the title, but may be corrected in the CA Index. I t will not be corrected in CBAC.

h. Italic lower case p is often confused with Greek rho, and rho with sigma. i. I n dosage statements, check milligrams us. micrograms. This list could be expanded almost indefinitely. At the risk of appearing to be overly cynical, I repeat Rule 1. Trust nothing. If I have seemed unduly harsh in my criticism of current practice, I revert to humanistic literature and cite Wordsworth’s “Ode to Duty ( 2 1 ) . “0, Duty. If that name thou love . . . check the erring and reprove.” To which there should, of course, be an erratum. Dear Mr. Wordsworth: “Love does not rhyme with reprove.” LITERATURE CITED

Shakespeare, Wm., “Julius Caesar,” Act 5, Scene 3, line 67. Pope, Alexander, “Essay on Criticism,” line 525. Descriptive Price List, Winthrop Laboratories, Special Chemicals Dept., April, 1965. Private communication. Wagner, E . S., R. E. Davis, J . Am. Chem. Soc., 88, 7 (1966). McBee, E . T., W. L. Dilling, H. P. Brandlin, J . Org. Chem. 28, 3595 (1963). Ibid.. p. 2256. Ibid., 29, 3744 (1964). Brace, X. O., J . Org. Chem., 29, 3744 (1964). Ibid., 28, 3102 (1964). Fieser, L. F., Fieser, M., “Topics in Organic Chemistry,” Reinhold Publishing Corp., New York, 1950. Wordsworth, William, “Ode to Duty.”

Procedures for Assessing Errors* KENNETH L. COE, DON P. LEITER, Jr., HARRY L. MORGAN, and FERD R. WETSEL The Chemical Abstracts Service, The O h i o State University, Columbus, Ohio 43210 Received M a y 26, 1966 User confidence in the validity of published chemical literature is directly related to the accuracy with which it is presented. The nature, types, and sources of errors which find their w a y into the primary and secondary journal literature are analyzed. A dlescription of the various methods by which the effects of such errors can be minimized through abstracting and indexing techniques is presented.

The Chemical Abstra’cts Service (CAS) recognizes that many errors are introduced into primary and secondary journals in the publication processes. Rather than dwell on the present status, which will be reviewed briefly, we will discuss CAS plans for minimizing these errors through various abstracting, indexing, and computerhandling techniques. * Presented hefore the Division of Chemical Literature, Symposium on Error Control in the Chemical Literature. ’51st Sational Xleeting of the American Chemical Societv. Pittsburgh. Pa.. March 23, 1966.

VOL. 6, No. 3, AUGUST1966

At CAS, abstracts are prepared, edited, and indexed by subject and language experts. This processing locates and resolves many of the detectable errors or ambiguous statements which occur in the primary literature, and/ or which occur in the abstracting and indexing operations. However, all such errors are not found even in these rigorous review and edit stages. Since errors in any part of a paper or in the prepared abstracts-tit1e3 reference, author names$ body of the abstract, body of the paper, or even in the 129

KENNETH L. COE,DONP. LEITER, JR.,HARRYL. MORGAN, AND FERDR. WETSEL indexes-can slip through to the published record, methods of correcting these errors have been established. For CAS, if an error in a published abstract is of major technical nature, the abstract is considered to be of no value, and it is reprinted correctly in Chemical Abstracts (CA) in its entirety; in such a case only the republished abstract is covered in the volume and/or the collective indexes. Thus, in retrospective searches the error-containing abstract cannot be located through the indexes. Other errors are of a type which can be corrected by entries in our indexes. Errors in the title of a paper, in an author name, or in the journal citation are corrected in the printed author index. Errors in the body of the abstract may also be corrected in indexes. Figure 1 shows such a situation for the CA Subject Index. Zirconium compounds. (See also Zirconium alloys.)

dicyclopentadienylzirconium bis(tetrahydr0borate), 62:3638h ( R e f . should be to p.

1798.) dicyclopentadienylzirconium chloride tetrahydroborate, 62:3639a (Ref. should be to p. 1798.)

Figure 1 . Chemical Absfracfs Subject Index corrections to published abstracts. The volume indexes are of substantial size, and under present methods of composition, typesetting, editing, and printing, the early parts of an index are completed before the latter parts of the index are completely processed. Errors in the early parts of an index detected while the latter parts are still in preparation are covered by errata listed at the end of that particular index. For errors detected still later, corrections are printed in the collective index for the period; errors detected after the publication of the collective index are corrected in the errata listed in the following collective index. Figure 2 is an example from a collective index. When CAS composition operations shift to a computer base, the editing of the entire index will be completed prior to composition; the more flexible computer composition procedures will permit corrections to be made to the index entries much later in the index generation process. I n this way many of the errata in current volume and collective indexes will be eliminated. CAS will within the next five years be a completely computer-based operation. At that time the final processing of Chemical Titles (CT), Chemical-Biological Actiuities (CBAC), and other contemplated new services will be fully integrated into the routine preparation of CA and its indexes. As CA is converted to a computer base, related operations will also be fully integrated so as to reduce multiple handling of the same material which can result in the introduction of errors. One example of the integration of CAS procedures is the present movement toward single keyboarding of citations. All of the information for a citation-including title, author names and address, complete journal reference, 130

and the original language of the article-is input to the computer and stored there in one form. The information resulting from this single keyboarding can be manipulated by the computer so that the output can be in any form required for inclusion in any of a variety of publications. An additional example of this integration is the retrieval of indexed names for compounds previously recorded in the CAS Chemical Compound Registry System. The ability to retrieve names in this manner will eliminate renaming and editing names for compounds previously indexed, thereby reducing costs and the possibility of introducing errors. The shift of production operations to the computer will allow many routine editing procedures to be carried out by computer programs. Two basic editing procedures are being programmed. I n the first, the format is being analyzed so that it can be defined and the information is being analyzed for consistency and completeness. For purposes of illustration, consider the CAS Chemical Compound Registry System. When a structural formula is input to the computer it is accompanied by a molecular formula calculated by a chemist. The computer input is not verified, which saves much time; the computer is programmed to check the structural formula data against recorded standards. When the consistency of the elements of structural data has been established, the computer program recalculates the molecular formula and compares it with the one provided by the chemist. Should any discrepancy be detected, the compound is flagged for remedial action and re-input. Our experience in the processing of more than 300,000 structures indicates that the detection devices are over 99.7% effective on the total input for two-dimensional representation. Programming the editing of structural formulas by computer is a great deal simpler than programming the editorial review of the average abstract. This is because the rules of preparation of structural diagrams are much more easily defined than the rules governing natural language. For the most part, we are a t this time dependent upon humans to accomplish most editorial review of open text. Certain applications for review of spelling and hyphenation can now be assigned to the computer; however, the detection of inaccuracies and incompleteness in the text is generally left to the human. An area which lies somewhere between the extreme difficulty of editing open text and the relative ease of editing structure data is the identification of inconsistencies in systematic chemical nomenclature and the translation of such nomenclature into more detailed structure descriptions. CAS is now working on computer programs to accomplish these purposes. By comparing the structural formula input to the Compound Registry with the record generated automatically from systematic chemical names, inconsistencies can be identified and brought out for review by a chemist. The second procedure for detecting errors prior to computer storage is the building of machine checks into the data. An example is the Chemical Compound Registry Number which is a numerical address used for interlinking related data in various computer files. I n copying numbers manually a simple interchange of two digits occurs frequently. Such errors are difficult to identify in a completely manual system, but can be avoided in the computer system JOURNAL OF CHEMICAL DOCUMENTATION

PROCEDURES FOR ASSESSING ERRORS

CORRECTIONS AND ADDITIONS* SECOND DECENNIAL INDEX (1917-1926) SL'RJKCT 1 S I ) K S Page 2463

crtylene, 1¶:1?111, the vol hould be " I S " , autcioxidati n study d , i n plants, 14 7 4 V he vol should be "1s

C der A c e t d d e h r d e , from

Page 2486

I'iider Acetic a c i d , , , c h l o r o - ,rnanul of, P 1 6 : 1 2 5 1 5 , )be !,age shrjuld he "12.52

"

346ti

l'nder Acet.ldeh!rde, cullecticrn o f . P 16.11512. the page should be 1252": manuf c d , (Parcur? ) 16: 12.514, the page should be "1252."

3006

I'nder E t h e r , a , 8 - , d i c h l o r o e t h y l e t h y l , 1 S .22Ofib, the page should he " 2 2 0 9 .

?467

L'nder A c e t a l d e h y d e , water detecfiiin i n , 16:26018, the \ 0 1 should be " 1 4 .

XI33

L'nder S o d i u m perborate,,douhle salts cr,ntg , P 11:8190', the Iiage shc)uld he 1890

THIRD DECENNIAL INDEX (1927-1936) SLHJECT I N D E X Page 31.52

Cn-ier Acetic acid, chloro-,maouf o f , add "16:304;

"

Page 5814

L'nder L u p e o l , b o m b i - , f r o m iilkworrn, 98 2.16'3". the c o l u m n should he "3469 "

3817

4808

Add " l , $ , S , 4 - B u t a n e t e t r ~ ~ b o ~ laicci d , l , l - d i o x o - , tetraethyl ester, 917.70T;." Delete " l . ~ , S , 4 - l t h a n e t e t r ~ c ~ baoc i~d l, ~l ,cl - d i k e t o - ' tetraethvl ester. N : 7 O T ;

ti602

" P h t h a l d e h y d i c a c i d ( 2 - i o r m y l b r r i m i r a c i d ) " should read " P h t h a l a l d e h y d i c a c i d ( Z . . / o r m y f b e w , o i ca c i d ) . "

7912

Under W o h l e r , F r i e d r i c h , urea synthesis by,: 100th anniversary of, delete "47:56844" and "48.30319 Under Wijhlerlte add "cornlm. of, etc , 47:36844,18:3031?"

"

4548

Under ?hprooflncp, compos 4310' 5 the v d . should be "IS

agents for, ( P a l r n / s ) 11

FOURTH DECENNIAL INDEX (1937-1946) A I ' T I I O R 1NI)EX

Figure 2. Collective Index errata to previous indexes. by adding a computer-generated digit. The Registry Number is a nine-digit number in which the ninth digit is a so-called "check digit" which results from programmed computation from the first eight digits of the assigned number. Thus the computer can detect most errors in the Registry Number. In addition to using the concept of a check digit in the Registry Number, a check digit has been incorporated in the CODEN reference. This C O D E S reference is a compact, unambiguous abbreviation used to represent the title of serial, and some nonserial, publications. As a further check during the processing of C T and CBAC, the volume, issue, page, and year portion of the reference is checked by computer lookup to be certain that it in within the prescribed limits for the journal being processed. Beginning with Volume 66 each CA abstract will be numbered separately. The abstract number will include a check digit. We hope that other publications will adopt a check-digit system for bibliographic citations to improve the reliability of machine-based systems for bibliographic retrieval. Patent numbers are a case in point in which check digits would be helpful. The check digit does not, VOL. 6, KO. 3 . AUGUST1966

of course, eliminate the possibility of errors: however, it does minimize the probability of an undetected error. Since the check digit is an integral part of a code, it is not necessary that the reader know the rules by which the check is computed, though the existence of the check digit and the means of computing it should be made generally known to those who have computer systems. As a matter of fact, it is important that standards for check digits be established to increase the general utility of those which exist. I n practice, some level of undetected error must be assumed to be present in a store of information. T o gauge the reliability and thus the utility of the store, it is desirable to make an effort to determine the extent and the nature of undetected errors. Such analysis is also obviously essential in order to provide for the improvement of the practices which led to the errors. T o determine the level of undetected errors in the CAS Chemical Compound Registry System on a statistically reliable basis, routine sampling procedures are used. I n practice, the sample is completely reprocessed, independently of the initial processing, and the results of 131

IRVING H. SHER,EUGENE GARFIELD! A N D ARTHURW. ELIAS the two processings are compared in detail. This procedure enables identification of the types, causes, and sources of undetected errors. The sampling procedure is not designed for the detection of errors. The fact that it does detect them is purely incidental. As users of chemical information we understandably desire the elimination of all errors in both primary and

secondary publications. However, as representatives of an organization which must recover the costs incurred in producing its publications, we must concede that some errors will exist in the published literature. CAS is striving, wherever possible, to reduce the number of errors in its publications and t o make appropriate reference to errors which are detected in the primary literature we cover.

Control and Elimination of Errors in IS1 Services* IRVING H. SHER, EUGENE GARFIELD, a n d ARTHUR W. ELlAS institute for Scientific information, 325 Chestnut St., Philadelphia, Pennsylvania

19106

Received M a y 16, 1966

The Institute for Scientific Information produces several indexes and abstracting and alerting services which contain error-controlling features. C u r e n t C o n t e n t s , I n d e x Chernicus have different and unique mechanisms for finding and correcting errors that have appeared in the primary literature and those generated during their production of these secondary publications. Aspects of error control in these publications will be discussed.

Current Contents. Index Chemicus (IC), Science Citation Index (SCI), and ASCA (Automatic Subject Citation Alert) each have different and unique mechanisms for finding and correcting errors that have appeared in the primary literature and those generated during the production of these secondary publications. I t is obviously only possible to state very briefly some of the problems that exist, as well as some of the methods that are used to overcome them. This paper illustrates some types of errors involving people, other types involving machines, and, finally, general or composite errors. The error-checking methods vary in complexity and extent according to the circumstances. For example, in the production of the Index Chemicus. not only are the articles indexed and graphically abstracted, but the molecular formulas of the individual compounds described by each author are recalculated by trained chemists. These abstracts are then sent to the original author for approval, giving him the opportunity of confirming corrected errors or making additional changes. Thus, the data found in Index Chemicus are sometimes more accurate than in the original article from which the abstract was prepared. There is another important aspect to the whole problem of errors in the literature. Consider the perpetuation of a mistaken method or data that can go on being used for years without knowledge of subsequent modification. I t is one of the unique capabilities of the Science Citation Index system that a straightforward check of the indexes will reveal such “corrections” (Figure 1). An example of what one error in Current Contents can do is the case of an article by D. M. Baron of London,

1965 SCIENCE CITATION INDEX

191 37 45 104

1571

5 t, 1 352 127 1113 1236 4Cl Zi h 1LOA

679

212

2 ‘1 7

9tl 1 4cia 630

3 74

1

655 236

a ft 3

1533

322 097 LCPG b79 1h ! 125 315 321 1439 168

, ---

D.G. Chapman, ANN. MATH. ST. 36, 1583 11965). CORRECTION TO “A COMPARATIVE STUDY OF SEVERAL ONE-SIDED GOODNESS-OF-FIT TESTS” UI D..G.

CliAi’hL4N

paper cited above ( A n n M a l h . Slnlisl. 29 (1958) 655-6741, it is stnted that “my monotone test is adtiirssible.” ‘1111. is 111 reference to the hypotlresis F = F, ngninit the alternative F < Fo , IC. Doksum hns pointed out ~ 1 r t t 1the test 9 = CI IS a ~ouiitcr-exmi!ile to this assertion which should therefore be delctcd. 111 the

1583

Presented before the Division of Chemical Literature. Sympnsium on Error Control in the Chemical Literature. 151-t National bleetinp of the .4merican Chemical Society. Pittsburgh. Pa.. March 23. 1966

132

Figure 1. Actual correction note b y Chapman citing original 1958 paper.

JOURSAL OF CHEMICAL DOCUMENTATIOK