The Exam Statistics: The Difficulty

In my last post, I discussed how I analyze the mean in my (multiple choice) exams. This time, I'm going to look at difficulty. This is not directly related to the mean. Huh? Isn't it the case that, the more difficult the exam, the lower the mean? Well, yes. But that's not the "difficulty" I'm writing about.

Among all the pages and pages of results I get from Test Scoring & Questionnaire Services is the "DIF" score or difficulty of each question. It's actually the proportion of the class who answered that question correctly. DIF=1.000 means that everyone got it right, but DIF=0.250 means that only 25% of the class did. But it's not really "difficulty," is it? If a question is really difficult, fewer people will answer it correctly and the number should decrease. So, really, it shouldn't be called difficulty, it should be called easiness. But, look, it's just called "difficulty," OK?

You might be thinking that I want everyone to answer every question correctly, right? Um, sorry to rain on your ice cream, It's really, really unlikely that everyone was able to learn absolutely everything in the course, and was also able to remember and apply that knowledge on an exam perfectly correctly for every question. What an exam should do is assess each student's learning of the material, and provide some way of differentiating among all students. If all questions are answered correctly, the exam itself has failed.

I went to a seminar last year at which a renowned expert in testing and exam question construction gave a talk. After it was over, I talked to him about DIF scores--specifically, what should they be? The general rule is that an exam question is doing a good job of differentiating among students if it's at least 0.300. That is, at least 30% of the class should be getting each question correct. There is no guideline for the upper end, but at another seminar, I heard an instructor say that she liked to put at least one DIF=1.000 question on each exam as a confidence booster. Yup, a gimme. I thought that was a pretty nice thing to do, so I try to include at least one high DIF question on every one of my exams, too.

So difficulty is related to the mean in that, the higher the DIF, the higher the mean on the exam overall. The mean is good for evaluating the overall performance of the class. But I also need to evaluate the questions on my exams, so I get the DIF score for each one. If the DIF is too low, the question either gets killed (*snff*), or rewritten to clarify it. Oh, and if I ever get DIF=0.000, it means I've keyed in an incorrect answer. Ooops.

Why aren't you studying?

The Exam Statistics: The Mean

With the first round of (multiple-choice) midterms over, I'm now swimming in data. I want to tell you about some of the stats I go through to assess and improve my exams. Unfortunately, I'm too late to celebrate (the first) World Statistics Day. But I don't feel too bad. At least statistics has a day. It's not like there's a "Psychology Month" or anything. Oh, look--yes there is. And I'm late for that, too. Moving on...

This installment is about the (arithmetic) mean, or, if you insist, the "average." I post the class mean of every exam because you demanded it! Really, though--what use is it to you? For classes that don't grade on the curve, you don't need to know the mean (or standard deviation) to determine your absolute standing in the class. Just take your percentage correct, and see what grade that corresponds to in the syllabus. Right?

Yes, that's important. But don't you want to know how everyone else did, too? Sure you do. "Did everyone think that exam was a killer, or just me?" We want to compare ourselves to other people. Some students even want to know what the top score was. "Did anyone get 100%?" "Am I the best in the class?"

The mean also serves another purpose, when there are multiple forms of an exam. In larger classes, multiple forms of an exam are used to discourage cheating (or at least, to make it more difficult). Typically, there is one form that has the questions arranged in order of topics (e.g., questions based on the first lecture and textbook chapter first, followed by questions on the second lecture and chapter, etc.). The other forms will have the questions in a random order. Are students who get the scrambled forms at a disadvantage? Or, put another way, is there a benefit to answering questions in a sequence that reflects the arrangement of the learning materials? If so, that wouldn't be fair, would it?

The data from every exam includes the means from each form. They are usually a little bit different. But is that difference a fluke, or is it due to the ordering of questions? Hmm, sound like a job for...statistics! The data also includes the results of an ANOVA (analysis of variance) that compares the means to each other. That is, are any differences statistically significant? The answer: No. I've never had a difference at p < 0.01 or even p < 0.05. That means any differences are small; they are due to chance.

The bottom line: It doesn't matter which form you get. Isn't science cool?

Why aren't you studying?

The Coffee

When I went to high school, it wasn't cool to drink coffee. Coffee was dark, scary, and bitter. Sure, my family would have Kaffeezeit ("coffee time") on the weekends, but I was just in it for the Kuchen.

When I started university, I drank a cup of tea with milk and sugar every single morning. Even though I had a lot of 8:00 classes (because they were good classes only offered at that time, that's why), one cup of English breakfast tea was all the caffeine I needed. Some of my friends became desperate around exam time, and dipped into the go-juice. It was hilarious to watch as the normally non-caffeine consumers' eyes got really big after having a big cuppa joe. Then, they'd study like the dickens. This proved that coffee was a dangerous, dangerous substance.

Then, I started graduate school. Sure, getting a graduate degree is pretty demanding. Maybe I'd have an extra cup of tea once in a while. But the sheer, stark terror of almost having to go into the Real World was enough stimulus for me--no coffee, thanks. Maybe just a bit more sugar.

Then, one term I was Dr. Dawson's teaching assistant. He made me come to the class (I dunno, to learn something I guess); because he was on my supervisory committee, it's not like I could say no or anything. The first day of class, I met him at his office and we went to the class together. But not before he poured himself a cup of freshly made French-pressed coffee. And then he insisted on giving me a cup. It's not like I could say no or anything. The worst thing was that the coffee was: black. As black as night. No milk, and certainly no sugar (isn't that freebasing?). After that, I brought my own mug and poured sugar into it first--milk was too conspicuous.

Now, I'm neither a connoisseur (yes, I drink instant--please forgive me), nor a coffee-hound. Usually, I get by with only one cup of coffee. But if I've had a bad sleep (which does happen), you'll see me toting a cup from one of the fine local purveyors. Not my usual mug-o-water; not tea; not vodka. I have gone over to the dark side. With lots of milk and sugar.

Why aren't you studying?

Find It