In the Classroom
Cheating Probabilities on Multiple Choice Tests Gaspard T. Rizzuto Department of Statistics, P.O. Box 44370, University of Southwestern Louisiana, Lafayette, LA 70504-4370 Fred H. Walters Department of Chemistry, P.O. Box 44370, University of Southwestern Louisiana, Lafayette, LA 70504-4370
It is not well known that the probability of a student having the same answers as another student can be easily calculated using the binomial distribution (1):
This paper is strictly based on mathematical statistics and as such does not depend on prior performance and assumes the probability of each choice to be identical. In a real life situation, the probability of two students having identical responses becomes larger the better the students are. However the mathematical model is developed for all responses, both correct and incorrect, and provides a baseline for evaluation. David Harpp and coworkers (2, 3) at McGill University have evaluated ratios of exact errors in common (EEIC) to errors in common (EIC) and differences (D). In pairings where the ratio EEIC/EIC was greater than 0.75, the pair had unusually high odds against their answer pattern being random. Detection of copying of the EEIC/D ratios at values >1.0 indicate that pairs of these students were seated adjacent to one another and copied from one another. The original papers should be examined for details.
n = number of multiple choice questions on the exam p = 1/number of choices for each question x = number of questions whose answers are the same
The expected value is np and the variance (σ2) = np (1 – p). For example, in a test of 20 multiple choice questions with 5 choices on each question, n = 20 and p = 1/5. The expected value or mean is 20 ? 1/5 = 4 and the variance is 20 ? 1/5 ? 4/5 = 0.32. The probability density function is n–x
n! px 1 – p n – x ! x!
So if two students have 5 of the 20 questions marked the same the probability is
P = 20! ⋅ 1 15! 5! 5
5
⋅ 4 5
15
Literature Cited
= .1746
1. Hogg, R. V.; Tanis, E. A. Probability and Statistical Inference, 2nd ed.; Macillan: New York, 1983. 2. Harpp, D. N.; Hogan, J. J. J. Chem. Educ. 1993, 70, 306. 3. Harpp, D. N.; Hogan, J. J.; Jennings, J. S. J. Chem. Educ. 1996, 73, 349.
Probability values can easily be set as guides (e.g., P = .001 or less), or tables of questions with common answers versus probability can be generated with the above information (see Table 1).
Table 1. Cheating Probabilities for Typical Test Configurationsa 20 questions 5 choices
No. in common
P
4 5
40 questions 5 choices
50 questions 5 choices
No. in common
P
No. in common
P
.2182
8
.1560
10
.1398
.1746
10
.1075
15
.02992
6
.1091
15
.00498
19
.00158
7
.0545
16
.001945
20
6.12 × 10{4
10
.00203
17
.000648
25
1.60 × 10{6
11
.000462
20
1.66 × 10{5
30
5.83 × 10{10
12
8.65 × 10{5
25
4.75 × 10{9
35
2.72 × 10{14
15
1.7 × 10{7
30
9.77 × 10{14
40
1.21 × 10{19
20
1.05 × 10{14
40
1.10 × 10{28
50
1.13 × 10{35
a Multiple-choice-type test. “No. in common” indicates answers common to a pair of students; P is the probability of cheating associated with this number. Boldface values indicate the number of common answers required for significance at the P < .001 level for each test configuration; values on both sides of the cut-off are highlighted.
Vol. 74 No. 10 October 1997 • Journal of Chemical Education
1185