Crime in the classroom: Detection and prevention ... - ACS Publications

Apr 1, 1993 - ... how to refute the old, "We studied together therefore our exams are similar" line). ... Crime in the Classroom: Analysis Over 26 Yea...
1 downloads 0 Views 6MB Size
Crime in the Classroom Detection and Prevention of Cheating on Multiple-choice Exams David N. Harpp and James J. Hogan McGill University, Montreal, Quebec, Canada H3A 2K6 There are few instructors who have not had to consider the problem of students cheating on exams. This topic has had considerable. recent attention. The data sueeests that between 21%-96% of students have cheated at least once ( I ) . Incidents can take the form of counterfeit term papers, copied lab reports. crib sheets. or blatant collaboration on v&ious kinds of examinations. While anecdotal experiences are many, concrete approaches to "solve" the problem have been relatively few. Difficulty can arise from the time-consuming, unpleasant task of accusing a student. The ultimate requirement involves forming a case to a university tribunal: In many events the eviience is secondhand, circumstantial, or statistical in nature. Administrators appear reluctant to involve themselves in the judgmental process due to the nature of the evidence, fear of possible lawsuits, and "bad press". Even where there is strong evidence to support an occurrence of cheating, students often are not prosecuted and seldom convicted. On this point, an offensive but interesting book recently bas gained considerable national attention-(2,. I t is entitled "Cheating 101" and describes not only how to cheat, but also hold; academics up for considerable contempt for being "nonvieilant" with regard to cheatine. Virtualiy all of the"schem&" described are predicated on permissive seating arrangements in crowded rooms involving single-version multiple-choice exams. 1; this paper we 411 discuss not only the problem of detecting u n u s w l h similar oairs of multiole-choice exams. but also will provide commonsense prevention measures that virtually eliminate this particular type of dishonesty. In addition, we will give convincing evidence that the defense, often provided by accused students-"We studied to-

"-

-

'Harpp, D. N.; Fenster, A. E.; Schwarcz, J. A. The World of Chemistry: A Chemistry Course for All'; The Third Chemical Congress of North America, Toronto, Ontario, June 1988; This is a large come (-500 students)with students from nearly every department on camDUS. The followina Faculties. Science (35%).Arts 135% and Man(20%. of aaement , ,. make uo ~, thd bulk of t h e class. As a'conieauence ~~~,~~ " tne large class size. rnulriple-cnozeexams are sea for eval~arion. 2The program EXAMSORT was wrmen oy M. Dom nc Ryan ana permlts me storage of J P to 9999 questions F~nhermore,me program permits exam questions and answers to be scrambled, resulting in any number of content-identical exams. ~

~

~

~~

~

~

~~~~

~~

~~

~~

306

Journal of Chemical Education

gether, therefore our exams are similarn-is simply unfounded. Results and Discussion In the first offering of our 1982 course "The World of Chemistryn,' a proctor noted several students sitting together, spending excessive time "apparently" looking a t adjacent exams. At the end of the testing period their papers were confiscated. Cursory analysis ri;caled large & mrrnls in the mitr~6nswhich corresponded to many of their mutual answers. In addition. all of these exams wire more similar to one another than tb any other exams of a similar grade in the class. Formal complaints were filed and a hearing was conducted by the Associate Dean of Science and the students were given a strong warning. The evidence was not judged to be strong enough to warrant a harsh penalty A computer-program was created2 that stores questions and generates multiple-version exams for this course. Over the past 11 years the use of four versions for each mid-term test (prelim) essentially solved the problem of copying. These exams are organized by the chemistry department durine the earlv evenine hours so that students can be disperse; in a nukber of &oms. However, McGill University administers its fmal exams for large classes in the gymnasium. Thus, the complexities of idiosyncratic professorial enforcement of scrambled exams resulted in the use of single-version tests over the years. In the gym, students in the same course were normally seated in a block with tables 60 in. apart in all directions. Contrary to popular opinion, this distance is not too great to prevent viewing nearby exams. This format is the one followed in many, if not most, colleges and universities. In May of 1989, a student (not in the course) approached one of us (D.N.H.) after the final exam and revealed that another student had given a third student all of the answers for that exam permitting his answers to be copied from his (optical scanning, sheet. He eventually disclosed the identities of the accused students when assurance was eiven that no rosec cut ion would be carried out " On close examination of the two answer sheets, it was found that the suspect pair had 97 out of98 questions identically answered including 23 identical wrong answers! The informant was asked, "If eome independent means

I

Table 1. Typical Data on Student Pairsa

(pairs) #Errors student 28 ave. student paire 28 ave. 18 student 18 pair' 1

20

25

E I C ~ E E I C ~ EElCI

EICC

IJiferencesd

14

7

0.50

35

9

5

0.56

22

18

17

0.94

10

FREB

# DIFFERENCES

-

" T t s data 8s laKenfrom real exams (- 7 & 9 0 qmst om, an0 s lyplcal of me most ncnm nal ng evloence n a I cases the paws were sealeo as ~mmedmle neghoors It s clear that me strongest evoence ex 3 s where a gwen par of st.oents nave .20'00ftnetota nbmoer o f q ~ e s t a n asexam s errnrs mcommon IEE C or aenl ca wrongs, Tne nbmoer of expened EE Cs vary a t h tne grade an0 nmoer of (l.eSons asgo%. In our experience, only about 0.1% of allpairs will have -10 or fewer dinerences in their oaDen (these are usualiv the oairs in the 90%

. "

~

~~~~

-

Ynls Par of st~oents.represents a 1yP:cal case tan average par of 51.aenlsl !or a c ass wah 100 q.sl ons (mean graae of 72%~ Typ ca tne aver. age par of sl,oonls *dl ha*e 28 enon :n aoodl ha1 tnat n n b e r of enors n comlnon IEiC = 14, -s.a a o o ~ naf t of the E C vade w I preaim tho exan errors n mmmon EEIC = 7 . Tnegreater oegrec of smr ar ty of me pam 1-6as comparm to tne arerage pa I seas y seen T l s par of Sl.oents represents afypca case (an average pa rof sl~dents) lor a c ass *en 60q.es:80ns ,-em graos of 70Dol.Typcal y "'le average pa r of I.oen!s w nave 18 crrors. {E C = 9.EE C = 5

*

could be evolved to produce the names of the cheaters (i.e., computer check) would there be objection to prosecution?" There was none, so our computer center was requested to create two sorting programs to accomplish the task. The interesting finding was that these programs revealed not only the original suspect pair, but &La number of other higblv sus~iciousairs. One of the two sortine DTOgrams ~ o ~ ~thea answer ~ e ssheets of all pairs of students.

-.

total pairs = (N)(N-l)IZwhereN = class size It "flags" those with less than a predefined number of differences in the answers to their questions. Their errors in common (EIC, i.e. questions answered incorrectly) are computed together with the number of exact errors in common (EEIC, sometimes termed "exact wrongs"). If the ratio EEICEIC exceeds a certain value (0.751, they are kept and recorded as "suspects". The choice of 0.75 is derived empirically because pairs with less than this fraction were not found to sit adjacent to one another while pairs with greater than this ratio almost always were seated adjacently. A typical output is displayed in Table 1.

Figure 1. Frequency distribution of the number of pairs of students one ofwhom has # correct responses on the test and who have given # differentresponses to the test questions. The 8 darkened "spots"(out1iers)represent 11 unique pairings (16 different people) out of 340 students.

The second program performs a similar analysis of differences for pairs in these tests within a certain grade range (or even the whole class) and the frequency of differences are displayed (Fig. 1).The number of correct answers (in this case + 7) is plotted against the number of differences in the exam papers for the pair. The frequency of this combination is represented by the height on the graph. By plotting all possible pairs, virtually the same plot is obtained because most cheating results in similar grades. The result is a "mountain of normal numbers of differences". "Suspects" are revealed as islands at a distance from the mountain in what we have called the "Sea of Corruption!' Because the student ID numbers are linked to the grade as well as to the differences in their exams, identification is certain. Seating charts are retained for final examinations, thus the placement of all students is not in doubt. To date, all of the obvious "outliers" have been seated as immediate neighbors in the exam room. The term "outlier" is taken to mean any data point on plots such as in Figure 1 where the point is separated from the "main group" on the side of the plot (lee) associated with exams of greater similarity. The suspects initially identified by the informant are shown at 7411 (74 correct answers versus 1difference in their exams) in Figure 1.In this plot one of the two possible exam scores made by this pair (of course in some cases, a given pair made the same score) is recorded on the Y axis. In almost all cases, the outliers (suspects) are defined clearly in either of the two possible plots, although in situations where one student makes a grade somewhat higher than the other (but with a relatively small number of differences for the grade that was made), the data point might be obscured under the "mountain". Plotting the data, using the other student grade, reveals the outlier pair. The few pairs that were uncovered in this fashion subscribed to all of the suspect criteria cited in the text. McGill University's grading program, used on optically scanned, multiple-choice examinations, provides in addition to grades, the number of students giving each of i responses to each o f k questions. Dividing these by the number of students (M taking the examination, gives an i x k matrix of values of pjk, where pjb is the probability that a student writes response i to question k. Normally i has valVolume 70 Number 4 April 1993

307

Figure 2. Frequency distribution of the logarithim of PROB, as described in the text, of randomly selected pairs of " A students. ues from 1 to 6. the sixth answer beine a blank (no response,. 1f desired, this program also can generate such a matrix for an arbitrarilv selected subset of students. (for example, the "A", "B" stkdents, etc., vide infm). The probability that a pair of students chosen a t random giving identical responses to a specific question is pik2: while that for a pair of students giving nonidentical answers ispjppja (i # j). The purpose of the calculation is to formulate a probability for pairs of students who have not acted independently. This method does not calculate the absolute probability that would require comparing all possible responses of students writing their specific pair of exams. The probability that is calculated uses identical-only responses and appears to give a reasonably sensitive measure to the degree of copying. Each time two students respond identically to a question, the product (called PROB, initially set to unity) is multiplied by the quantity pib2. Because tests may have upwards of 100 questions, PROB can become very small. The calculation is done logarithmically because occasional results violate underflow conditions This is a characteristic problem with most desktop PCs. While this calculation of PROB can be performed for all ofthe ~N!~N-l!/Zpairs ofstudents (someti& classes have had 700 students,, it is suficient to calculate PROB for the suspect pairs andcompare i t to the distribution of PROBs for a group of non-suspects. About 25 non-suspects in the grade range of the suspect pairs are randomly selected. The probability calculation is carried out using the group responses for each question on all possible pairs (300 for 25 exam papers) and a mean and standard deviation is calculated. To establish the non-suspect distribution, experiments with a different, larger or even smaller sample, reveal that there is little variation in the final statistical 3The K-Index was developed by the Educational Testing service and has been in use since 1980. The methodology and reliability have been tested and upheld in the courts. See Denburg versus Educational Testing Service. No. C-1715-83 (New Jersey Superior Court, Chancery Division, Aug. 4, 1983). 4Thesorting programs to generate 2-D plots similarto Figure 1,the error-incommon analvsis as in Table 1 and the orobabilities have been written in a lanabaae (BASICAI to function o;? IBM PC's . Afull -~ ,~~~~ &&s tor a cass of 350 on a 100-questionexam takes about 16n w th a 80386 ch p on a compdter rLnn ng at 33 Hz. For a crass of 50 on a 50-question exam the time would be about 10 min. All of the calculations can be carried out on a mainframecomputer in very short order although the 'turn-around time is likely longer than the in-house approach. We will provide the IBM programs on a 5.25- or 3.5-in. disk if a formatted blank disk is sent to us. ~

308

~~

-

~

~

o

~

,

Journal of Chemical Education

~

F ~ g ~3r eTne same plot as n Fag 2 expanoed to snow two exireme out1 er pars Tne arrow at 60 md~cates>6 b I ton. 1 odds Tne palr at 74 1 174 correct: 1 dllferencein Fta 1 I were at 210. the pa r at 61 13 are shown at 2801 There was very extensive copying in these two cases with > 20 wrong answers in common. The odds against these events happening by chance are in excess of 10~11. data for the group. This is expected from the Central Limit Theorem (3)or the Random Walk Theory in a number of fields of physics. This group of non-suspects has grades similar to those of the suspects. The probability data is aouD. responses for each computed for them (usinn the .. to establish the ion-suspect diskbution. A olot of the distribution of the loearithm of these nonsuspect probabilities is Gaussian or "bell" shaped (Fig. 2). This distribution is tested for its adherence to the shape of the Gaussian curve by calculation of the mean, standard deviation (01, kurtosis and by a comparison of the number of events between & 1, 2, and 30 to that of the integral of the Gaussian fundion. Once the standard deviation of the distribution curve is known, the calculated value of PROB for any given pair of students mav be determined to lie no from the mean. The probability &at this value of PROB has been achieved in a random manner (without collaboration between the two students! is determined from the properties of a Gaussian. The same Gaussian is portrayed on a dfferent scale in Figure 3 where the positions of some of the notable outliers is evident. If these odds exceed a certain value-assuming close neighbor seating-it is believed that there is a very high likelihood that copying took place between that student pair. To predispose a "cut-off value is a difficult matter. For example, in performing ballistic tests, the international crime-prevention community has, by analysis of the process of matching scratches on comparison bullets, set odds for this event of 6.2 billion to 1(at least 13 scratches have to match). The conclusion is usually that the markinas could not have happened by chance alone ( 4 ) (when using standard deviations to determine odds, 6.3 o g.lves 6.2 billion to 11. The Educational Testine Service provides for "Questioned cores" (5)and uses a specific "The K-Index* as a guide. Essentially this is a deviation from the mean of 3.70 or ereater (at least 10,000 to 1that the event was not random). Our conclusions, based on a consideration of the above. is that outliers havine -5 standard deviations (-3.5 millionll), is a suitable cut-off value. This depends somewhat on the class size and number of questions in the exam (6). Our methods and results4 appear consistent with those of the Educational Testing Service (Princeton, NJ)for finding problem cases (5).

-

-

The first analysis to detect cheating appears to have been carried out as early as 1927 (7).Since then, there have been a number of related approaches to detect cheating in multiple-choice exams (8).The excessive presence of identical errors in a given pair of papers is the basis of most forms of analysis of this problem (5-8).The general principles of this approach recently have been elaborated (9). Of special note is the work by Frary (6).This paper approaches the detection problem in a rather different fashion from ours, but it comes to similar conclusions. Frary has kindly carried out his analysis on our classes and not only do the same outliers show up, but also they appear in nearly the same order of culpability At the present time, using separate criteria, neither program suggests a wllaboration for any pair who were not seated as immediate neighbors; i.e., no false positives. It also should be pointed out that in our analysis it is not possible to determine the direction in which the cheating took place. In this circumstance, seating charts, proctor observation, and erasures give useful clues. It is apparent from the above that different absolute values of PROB would be calculated if the valuespa are generated from the responses of an entire class or from a limited subset of the class. This is significant only in the case of students who make grades in the D/F range. Our basis for "stratification" of suberou~swas s i m ~ l v to compare .'A" student suspects with "2 siudent peeis"etc. Because most ofour susvects were in the "A" and "B"erade range, (McGill ~niverskystandards) our compariso& almost always dealt with these groups. If the suspects from this range are compared to the whole class (using the answer distribution of the whole class), the final number of us a "suspect" pair is situated fmm the mean, remains virtually the same as when they are compared with their own "stratum". This was not expected intuitively by the authors. The apparent meaning is that pairs of students who have collaborated extensively produce exams that are quite noticeably different from random pairs, regardless of how the random pairs are chosen from the whole class or from a selected subset. This is not always the case for stucom~aridents in the lowest .. erade ranees. .. Here.. Drover . sons require stratification. If this is not handled as above and the whole class distribution is used in the calculation. a few "apparent" very low PROBs are produced among t h i DIF group, thus implying copying. There can be relatively large numbers of shared identical wrong answers (EEIC) for two poor students. It is unlikely, however, that many poor students can (or would) "profitably" copy from other poor students in the first place. It also should be pointed out that no pair in this category has ever come close to fulfilling the criteria of similarity for suspects as defined by Figure 1 or Table 1. In addition, none of these pairs has ever been found seated in an adjacent fashion. It is of s~ecialinterest to note that the consensus answers of &dents in the lowest 10-20% C'F", "D" grades) of the classes examined consistentlv will achieve a solid Dassing grade, often as high as "B". Data supporting this &teresting finding is summarized in Table 2. The implication is clearly that if a student has visual access to a number of other exams as is the case in a crowded exam room, it is easily possible to take a quick "consensus" among 6 5 exams and choose the most popular answer. A decent grade is guaranteed, even if the 4-5 students are not among the best! It can be gleaned from the representative data in Table 2, that the general profile of answer patterns do not change much from the whole class to the "A", "B", or "C" students. Thismeans that nearly the same deviations from the mean are found for outliers using the response distrtbution for

.

Table 2. Proflle of the % Responses (actual number of students in parentheses) for Different Choices for the Whole Class, A, B, C, and D/F students (109 students) of a Typical Question on a Multiple-choice Exam

choice 1

Class

2

3

5

4

5

%correct

68 81(16) B 77(39) C 68(19) 51(35) DIF . . The majority of all students in each grade categarychose response 2 as the correci answer AS mentioned in the text, this situation is commonly

A

9 6 10 10 8

68 81 77 68 51

0 0 6 11

17 6 13 16 28

1 6 0 0 0

found in exams.

the whole class or any individual sub-group down to the D/F students. In this latter group the response profile can become significantly different from the "A-C" grade group. Thus, the probability calculations for the DIF student group need to be related to their own distribution rather than to the one for the whole class or anv other sub-moup. .. . In our experience, this is a moot point inrhat we hare very rarely found a n y pair ofDIF students who had very stmtlar responses; thus, a calculation of their probabilitces is not normally important. ~. Of considerable practical utility in identifying outliers is the dot shown in Fieure 1. Mere. the dark tnaneles rewesent eight different pairs (16 students) who literally st'and out as being unusually similar when compared to the pairings for the rest of the class. These eight pairs all sat as immediate neighbors. All had odds much greater than -4 u (10,00011) against their papers having been written by chance. In fact, the initial suspect pair (74 correct versus 1 difference in Fig. 1)had a value of nearly 210! Here the odds are > 107'/1. This number approaches the probability of selecting one particular atom out of all of those in the universe (10'') (10). Over the past four years, about 75 different exams from five different institutions (using term tests as well as finals) were evaluated using this system. These exams were taken from chemistry (organic and the general 'World of Chemistry" course mentioned in detail in the text), physics, pathology, physiology, biology, psychology, and educational ~svcholom.The number of questions varied from 20 to 120 h&ngo&rall class averages rangingfmm about 55 to RSC: with most of them averagine -70"r. Unusually similar pairs of answer sheets do not "stand out" very ciearly for tests of 25 questions or fewer or for tests where the class average is -80% or more. It would be our conclusion that effective monitoring is best achieved for exams having test items of 30 or more where the class average is -80% or less. In all cases where the exam was in the single-version format, and seating was not directed to be random, "outliers" were found. In all cases, if the "outlier pair" had (a) more than 75% of their "in common" wrong answers identically answered,( b) showed up on the plot (Fig. 1)as distinct from the rest of the class, and ( c ) differed from nonsuspects by -50, they were investigated for their seating positions. In every case to date, pairs of students meeting these criteria had immediately proximate seating. As a consequence,their names were reported to the appropriate associate deans. The disposition of some of these cases is pending. When students are confronted with almost any kind of copying "problem" during exams, they frequently offer the time-honored excuse that they had studied together thus "explaining the great similarities in that they ''knew the

-

-

Volume 70 Number 4 April 1993

309

same things". We must admit that on first consideration, this seems plausible. It certainly is possible that course segments are taught poorly; textbook errors and mutual misunderstanding of information by the students could therefore account for some percentage of answer coincidences. Similarities are expected on essay-type or problem-solving exams, but word-for-word sections are not. The same concept holds for multiplechoice tests. In the first place, "distractors" (wrong answers or responses that the instructor creates durine the drawine-uu .. . of the exam) usually have no relevance to material actually delivered in the classroom. It follows that students would (in most all cases) be unable to study such material. This is the reason that., (since the advent of this t w e of exam in the 1920's . (711,evidence for copying has c&red on "exact wrongs" made by students. Furthermore, if one were to imagine a pair of students whose exams might be predicted to be similar in an honest way, it would be identical twins who study together. Fortunately, we had two pairs of identical twins in our classes. Both pairs of twins made nearly identical grades in all of their courses. One pair studied together intensively and the other pair, less so. In both situations, while the grades were similar. the number of differences in their exams were not found to he significantly different from the most tvoical uair of students in their made ranee as disulaved inblot/similar to Figure 1. Finally, one pair of students detected as "suspects" in three different exams in three different university courses were evaluated for their similarities on an exam where thev wrote scrambledversions (the content beine identical: butwith scrambled questions andfor answers). They could not have collaborated. The auestions and answers were unscrambled to permit a comparison of the degree of identical intellectual content. The results revealed that the two students had absolutely normal numbers of differences in their work. Thus, we feel that the "We studied together!" defense has an extremely weak basis, if any a t all, in supporting unusually similar exams as described here. We wished to demonstrate clearly that commonsense measures such as random seating, multiple-version (multiple form) exams, and widely spaced seating would prevent detectable cheating. Under these controlled conditions, no unusual outliers were detected a t all! To date, there have not been any pairs who took exams where these prevention measures were used (and who were more similar than the rest of the class) have even been seated near one another. Furthermore, when the probabilities are calculated for such pairs, they have been within -3.50 from the mean. There have been special exceptions to this result from time-to-time. Recently, during a McGill exam, as well as in two different exams at two different Ivy League institutions, a pair of students (remarkably) copied -80% of the answers from a different exam version. In two of the cases, the students confessed when confronted with the situation. The extreme low mark obtained when the exam was graded against the "proper" version of the examination (-20%) was the "tip-off" and the computer programs in

-

-

-

A

.

'The papers cited in the literature (58)describe the major studies involving statistical methods for discovering copying. By necessity, all of these deal with the manipulation of student responses in a paired fashion which emohasizes the likelihood of mutual answer strinas or identical f these oaoers - -~~wrono - e answers. - - - Some ~ o , ~ onlv ~ , use , datairom ~ ~ comp~ter-s~m~ ated exams Wnlle these give exce lent g~ioemes for f~nhersluoy, tne revlew by Hanson, harrs, and Brennan mdlcales that "One of these limitations is that it is likely that none of the types of simulated copying studied here completely describes the copying that occurs by any real examinee."In addition, other articles cited in (8)study only a single exam. ~~

310

~~~

Journal of Chemical Education

~

~

~~~

these cases were not required to spot the offense. In the third case the student "stonewalled" in suite of incontrovertible odds against this random "steal&", an astonishine 1017/1!On this evidence the associate dean awarded t h l student an "F"; the student did not appeal and withdrew from the university. An indication that flagrant use of the work of others is not limited to undergraduates, has recently surfaced. The Dean of the Faculty of Communication at Boston University was accused and eventually admitted plagiarizing significant sections in his convocation speech (11). Interestinelv. the New York Times reuorter covering the stow. was a&ked of taking sigxuficant sections of hisarticle (Gthout attribution from the initial reuort) from the Boston Globe! An indication of bizarre excuses offered when exam takers are caught cheating was reported in The New Yorker magazine, Sept. 17, 1990, p 110 (from the Los Angeles Times) "Eleven National City police officers were caught cheating. on a promotion exam. However, no disciplinary action was taken against them, because they had not been specificallyinstructed not to cheat." An incident worth noting involvks a cheating situation a t the Massachusetts Institute of Technology concerning the limits of collaborative work on scientific projects (12).Specifically, students were assigned to generate original computer programs but uoliw. shared their results extensivelv aeaiust course . " Finally, in a landmark case, a ~ a G l a n judge d sentenced a student to six months in iail: this student admitted uavineanother to write his (13).

re o exams

A

-

Conclusions There are a number of conclusions that can be made. First, there is a considerable level of extreme cheating that takes place on multiple-choice exams (- 5%; in Fig. 1there are eight darkened "spots" representing 11unique pairings and 16 different people out of 340 students). From detailed analysis of some of the exams, partial copying clearly takes place that would support a cheating figure of -10% or more. Second, simple prevention measures appear to eliminate completely detectable copying. Fiualiy, the excuse offered by students that their papers are similar because of their havine studied toeether.-is fatuous. There are several novel features in our approach. The fundamental data which indicates exams that are unusually similar (Table 1and Fig. 1)can be generated by simple sorting programs? Our confidence that these pairs are linked statisticallv is confirmed bv their immediate seating arrangement in the exam roo& The probability calculations. as well as a near-perfect correlation with the results on our data, is obtaiied by the Frary method (6).In addition to the simplicity of our system of sorting, this approach provides immediate visual recognition of the suspect pairs. To date, every obvious outlier pair in plots like Figure 1has been seated together, and they have exhibited high ( S o ) deviation from the mean when compared to a group who did not sit near one another. O w criteria (% wrong answer similarity (Table I), difference plot (Fig. 11, seating proximity and probability data) give us high confidence that we have made no false positives to date. One final feature of this work differing from most previous efforts 181in this area involves the use of a substantial variety of real exams where seating is known! While we appreciate the "big-brother" appearance of this study, we believe i t is the only way to expose the problem. I t demonstrates that low-key measures of seat and exam scrambling solve it, and it induced our administration to adopt appropriate exam security measures. The following recommendations were made for our examination system a t McGill University in the summer of 1990.

-

-

Recommendations 1. Multiple-version exams where appropriate. 2. Random seating (with charts) for semester exams and a "stripe" (alternate rows with a differentcourse) seating (with charts)for final exams. 3. Use of "exam verification pmcedures" (camputer analysis) for all multiple-choice exams as a detectionldeterrent device. In October of 1990, the McGill Senate voted to adopt measures #1 and #2 for use across the university I t should be noted that to date, compliance with the new regulations has been high although, as expected, i t took some time for the regulations to be employed routinely. It rsclcar that under these relatwcly slmple regulations, f eliminated~ cheating will be wbstantiallv decrended ( ~not in mult~ple-choiceexams and likely greatly reduced in the other methods of examination. The exact fate of the monitoring system continues to be discussed. It could be used routinely to monitor all multiple-choice exams and in this fashion serve to inhibit students o' m cheating. Obviously, it would quickly reveal if exam procedures were not being followed properly. It seems to us that rather than this "big brother" approach to the problem, there is benefit gained in relying upon the studeits'iutegrity to report to a designated university official (ombudsman), signif~cantdeviations from established exam procedures. It is our belief that if students are seated randomly, spread out reasonably and where possible, given multiple-version exams, there should be essentially no cheating of the type described. We have discussed this problem with wlleagues a t over two dozen institutions including some of the most highly regarded in Canada, the United States? and the United Kingdom. While some institutions (notably in the U.K.) employ several of these measures on a routine basis, the vast h h k of ~ o r t ~rnerican h exams are not subject to these simple constrnint~,and the result is that cheating is prev=A recent broadcast on National Public Radio described the cheat-

ing phenomenon in the United States in some detail: "The Talk of the Nation", May 18, 1992.

Acknowledgment We thank William Lcggett (Academic VicrPrincipal), T. H. Chan ,Dean. Facultv ofSeiencc, and Roeer Riaelhof(Associate ~ e a nFaculty , bf Science) of M&$I ~ n & e r s i t yfor cooperation in assembling the data for this study as well as Rqbert Frary of Virginia Polytechnic Institute (Blacksbum. VA) for valuable discussions and extensive cooperat i o L ~ aer e especially grateful to Debra Simpson (cornouter Centre. McGill University) for devising one of the computer pro'grams, to James jennings (cornell University) for special computational assistance and to Timothy Rahily (McGill University) for help in searching the literature in this unusual field. Finally, we offer special thanks to Edith Zorychta and James Ramsay (McGill University) J. A. Schwarcz and A. E. Fenster (Vanier College and McGill University) for valuable discussions and to our colleagues in a variety of institutions for their continued support.

9. Jeffewa,W. H.; Bwer, J:O.Am. &i. 199P,an,M. 10. Shklouskii, I. S.:Sagan, C. Intelligent Lifein the Uniuersp;Holden-Day, W rims, ~uly3,1991, P A M 11. T ~ P N D YW 12. T h ~ N e wYark nmes,May22.1991, p ,423.

1966, p 130.

13. The WashingmnPost 1982,(Oct. 241.1.

Volume 70 Number 4 April 1993

311