The Exam Statistics: The Point Biserial Correlation

I'm continuing my explanation of the reams of statistics I get about multiple choice exams. Last time, I explained exam item difficulty scores. (Fascinating, no?) This time: point biserial correlation coefficient, or "rpb". That is, "r" for the correlation coefficient (why, oh why is it the letter r?) and "pb" to specify that it's the point biserial and not some other kind of correlation. Like, um, some other kind.

If I've constructed a good exam item, it should be neither too hard nor too easy. It should also differentiate among students. But I can't tell how well it does that just by looking at the difficulty score. Instead, there's a more complex measure, the rpb. In general, I need a correlation index for a categorical variable with a continuous variable. More specifically, I want to correlate the categorical variable of a test item (i.e., whether a student answered the test item correctly or incorrectly), with the continuous variable of the student's percent score on the examination. Got that? I didn't think so.

Let me try again. Student A did well on the exam, getting 90% correct. Student B did not do so well, getting only 50%. If I look at any given exam question, in general, student A should be more likely to answer it correctly than student B. This is not the same as difficulty, because I'm not simply looking at what proportion of the class answered the question correctly. I'm correlating each student's score with their performance on each question. The key to all this is the word "should" in the sentence above.

If an exam item is poorly constructed for whatever reason, good students may do worse on it than students who did worse on the exam. That is, the better you are overall, the less likely you are to answer it correctly. That is not supposed to happen. The rpb gives me this information for each question on the exam. Experts in exam construction recommend that the rpb should range from 0.30 to 1.00. Anything question getting a rpb lower than 0.30 means that I will take a look at it and try to figure out why that's happening.

And if the rpb is negative, well...it's a negative correlation. That's the worst case I described: better students are doing worse answering this question; and poorer students are doing well. I won't use any questions getting a negative rpb again unless I can figure out why it's happening. Maybe I can tweak the question, maybe I have to rewrite it to ask about the same knowledge in a different way. Or maybe I'll just give up entirely, go and get a coffee, and check out some LOLcats.

Why aren't you studying?

The Exam Statistics: The Difficulty

In my last post, I discussed how I analyze the mean in my (multiple choice) exams. This time, I'm going to look at difficulty. This is not directly related to the mean. Huh? Isn't it the case that, the more difficult the exam, the lower the mean? Well, yes. But that's not the "difficulty" I'm writing about.

Among all the pages and pages of results I get from Test Scoring & Questionnaire Services is the "DIF" score or difficulty of each question. It's actually the proportion of the class who answered that question correctly. DIF=1.000 means that everyone got it right, but DIF=0.250 means that only 25% of the class did. But it's not really "difficulty," is it? If a question is really difficult, fewer people will answer it correctly and the number should decrease. So, really, it shouldn't be called difficulty, it should be called easiness. But, look, it's just called "difficulty," OK?

You might be thinking that I want everyone to answer every question correctly, right? Um, sorry to rain on your ice cream, but...no. It's really, really unlikely that everyone was able to learn absolutely everything in the course, and was also able to remember and apply that knowledge on an exam perfectly correctly for every question. What an exam should do is assess each student's learning of the material, and provide some way of differentiating among all students. If all questions are answered correctly, the exam itself has failed.

I went to a seminar last year at which a renowned expert in testing and exam question construction gave a talk. After it was over, I talked to him about DIF scores--specifically, what should they be? The general rule is that an exam question is doing a good job of differentiating among students if it's at least 0.300. That is, at least 30% of the class should be getting each question correct. There is no guideline for the upper end, but at another seminar, I heard an instructor say that she liked to put at least one DIF=1.000 question on each exam as a confidence booster. Yup, a gimme. I thought that was a pretty nice thing to do, so I try to include at least one high DIF question on every one of my exams, too.

So difficulty is related to the mean in that, the higher the DIF, the higher the mean on the exam overall. The mean is good for evaluating the overall performance of the class. But I also need to evaluate the questions on my exams, so I get the DIF score for each one. If the DIF is too low, the question either gets killed (*snff*), or rewritten to clarify it. Oh, and if I ever get DIF=0.000, it means I've keyed in an incorrect answer. Ooops.

Why aren't you studying?

The Exam Statistics: The Mean

With the first round of (multiple-choice) midterms over, I'm now swimming in data. I want to tell you about some of the stats I go through to assess and improve my exams. Unfortunately, I'm too late to celebrate (the first) World Statistics Day. But I don't feel too bad. At least statistics has a day. It's not like there's a "Psychology Month" or anything. Oh, look--yes there is. And I'm late for that, too. Moving on...

This installment is about the (arithmetic) mean, or, if you insist, the "average." I post the class mean of every exam because you demanded it! Really, though--what use is it to you? For classes that don't grade on the curve, you don't need to know the mean (or standard deviation) to determine your absolute standing in the class. Just take your percentage correct, and see what grade that corresponds to in the syllabus. Right?

Yes, that's important. But don't you want to know how everyone else did, too? Sure you do. "Did everyone think that exam was a killer, or just me?" We want to compare ourselves to other people. Some students even want to know what the top score was. "Did anyone get 100%?" "Am I the best in the class?"

The mean also serves another purpose, when there are multiple forms of an exam. In larger classes, multiple forms of an exam are used to discourage cheating (or at least, to make it more difficult). Typically, there is one form that has the questions arranged in order of topics (e.g., questions based on the first lecture and textbook chapter first, followed by questions on the second lecture and chapter, etc.). The other forms will have the questions in a random order. Are students who get the scrambled forms at a disadvantage? Or, put another way, is there a benefit to answering questions in a sequence that reflects the arrangement of the learning materials? If so, that wouldn't be fair, would it?

The data from every exam includes the means from each form. They are usually a little bit different. But is that difference a fluke, or is it due to the ordering of questions? Hmm, sound like a job for...statistics! The data also includes the results of an ANOVA (analysis of variance) that compares the means to each other. That is, are any differences statistically significant? The answer: No. I've never had a difference at p < 0.01 or even p < 0.05. That means any differences are small; they are due to chance.

The bottom line: It doesn't matter which form you get. Isn't science cool?

Why aren't you studying?

The Coffee

When I went to high school, it wasn't cool to drink coffee. Coffee was dark, scary, and bitter. Sure, my family would have Kaffeezeit ("coffee time") on the weekends, but I was just in it for the Kuchen.

When I started university, I drank a cup of tea with milk and sugar every single morning. Even though I had a lot of 8:00 classes (because they were good classes only offered at that time, that's why), one cup of English breakfast tea was all the caffeine I needed. Some of my friends became desperate around exam time, and dipped into the go-juice. It was hilarious to watch as the normally non-caffeine consumers' eyes got really big after having a big cuppa joe. Then, they'd study like the dickens. This proved that coffee was a dangerous, dangerous substance.

Then, I started graduate school. Sure, getting a graduate degree is pretty demanding. Maybe I'd have an extra cup of tea once in a while. But the sheer, stark terror of almost having to go into the Real World was enough stimulus for me--no coffee, thanks. Maybe just a bit more sugar.

Then, one term I was Dr. Dawson's teaching assistant. He made me come to the class (I dunno, to learn something I guess); because he was on my supervisory committee, it's not like I could say no or anything. The first day of class, I met him at his office and we went to the class together. But not before he poured himself a cup of freshly made French-pressed coffee. And then he insisted on giving me a cup. It's not like I could say no or anything. The worst thing was that the coffee was: black. As black as night. No milk, and certainly no sugar (isn't that freebasing?). After that, I brought my own mug and poured sugar into it first--milk was too conspicuous.

Now, I'm neither a connoisseur (yes, I drink instant--please forgive me), nor a coffee-hound. Usually, I get by with only one cup of coffee. But if I've had a bad sleep (which does happen), you'll see me toting a cup from one of the fine local purveyors. Not my usual mug-o-water; not tea; not vodka. I have gone over to the dark side. With lots of milk and sugar.

Why aren't you studying?

The Comics

Hey, who doesn't love comics? Not me! No, I don't not love comics. Um. Here are some of my favourite web comics.

PhD Comics is about grad.students who seem to have a problem finishing their theses. (Want to know how to drive a grad.student insane? Ask her if she's finished her thesis yet. Hee!) Even if you're not a grad.student, it's still pretty funny, poking fun at all kinds of academic matters. (This one made me LOL.) There are 3 comics per week (Mon/Wed/Fri).

xkcd isn't an abbreviation or acronym--it's just the title of a webcomic, aimed at people who have the ability to think. This rules out a lot of people who just won't "get it." Some gags require knowledge of science. Gasp! Conveniently comes out 2 times per week (Tues/Thurs).

Lab Bratz isn't just for lab ratz (er, rats). Has gags on academia, but doesn't require a degree to get the joke. Only 1 per month.

Do you have any favourite web comics? (To A.K.: Yes, I know about Salad Fingers, which is technically a cartoon. And yes, I read your blog!)

Why aren't you studying?

What I Did on my Summer Vacation (2010 version)

Went to Sylvan Lake, like last year. (Well, like every year, really.) Pros: No vehicle breakdowns this time. Cons: Smoke. *cough cough* The smoke drifting in from the forest fires in BC was so bad, it actually turned a sunny day into a cloudy one.

Next: Calgary. Yes, an Edmontonian can go to Calgary without being afraid to admit it. It's just important to leave Calgary again. The trip to the zoo was especially for my youngest daughter, so she could see real “tigas,” “elfints” and “zeebas”. Her big sister liked the playground best--no manure smell. As for me, well, I just went for the food.

I had my 25th high school reunion. It was a mindbending blast from the past, catching up with people I haven’t seen since high school. Others (like Eric in the picture), I lost touch with in the middle of undergrad years at university. Unfortunately, some old friends couldn’t make it to the get-togethers, but I was able to get in touch with them again via Facebook. (Yay, Facebook! Have you heard about it yet?)

Took my wife to the Lady Gaga concert. Had pretty lousy seats, but they were better than the ones that Calgary got. Bazinga! (Her tour didn't,um, stop in Calgary.) If you look really closely at the crappy photo I took using my crappy cell phone, you can see two blobs. One is Gaga, the other is the beautiful tongue of fire rising from her piano as she sang Speechless. Yes, I need a new cell phone.

Do I have to list all the Edmonton festivals I went to? Nah. You know all about those, right?

Beyond that, hmm, let's see: the usual collection of birthday parties, days in the park, BBQs with friends. Got an MRI on my knee. Woot! Sadly, it didn't leave me with any super powers.

And work, of course. There's always work (so, it wasn't completely a summer vacation, was it?). Instead of writing new lectures, I decided to improve the ones I've got. Although--I did come up with a new mini-lecture, by special request and...well, that's another blog entry.

Now, I've got a couple of questions: What did you do on your summer vacation?

Why aren't you studying?

The Awards: 3

The Department of Psychology's Teaching Honour Roll just came out for Winter, 2010. I'm happy to say that I (*modestly*) was placed on the Honour Roll with Distinction for all three of my courses. Woot! I also have to mention that 75% of instructors who taught in that term also got on the Honour Roll. Nice going, Department of Psychology colleagues! (But why the pic of the FIFA World Cup Trophy? Because...well, because. The World Cup is on right now. That's why.)

I also had the honour of winning two (!) of the inaugural Tolman Undergraduate Teaching Awards, or TUTAs. (Just say that out loud: toot-ahs, toot-ahs. Fun!) The two I won were:

  • “The adoption of fake accents for educational purposes” (blimey!) and
  • “Assignment most likely to results in a missing-persons report” (because of this assignment--but I've never lost a student...yet)
Why are they named for Tolman? He never studied/taught/researched at the UofA. It was the choice of the Associate Chair of the Department--she's got this quote in her email sig:
"Since all the sciences, and especially psychology, are still immersed in such tremendous realms of the uncertain and the unknown, the best that any individual scientist, especially any psychologist, can do seems to be to follow his own gleam and his own bent, however inadequate they may be. In the end, the only sure criterion is to have fun." (E. C. Tolman, 1959)
So, yeah, the "TUTAs" are tongue-in-cheek awards. But I'm still gonna frame them and hang them up somewhere. Maybe in my Awards Room. That's a good place for them. As soon as I get an Awards Room.

Finally, by popular demand, here are some selected comments from my courses in the Winter, 2010 term, followed up by the every-popular snarky responses:

Intro psych:
"[I] pay to be taught, not to read a textbook"
"Textbook reading should NOT be mandatory for exams"
(Why no love for the textbook? You rated it 4.1/5, which is not spectacular, but not bad either. Like it or not, you're going to have to read in university.)
"exam...focuses too much on [lecture] notes"
"Exam questions need to be better constructed & peer-reviewed"
(OK, I admit I do have to work on my exam questions. But peer-review? And usually, I get criticized for having too few exam questions from my lectures.)
"notes are too straight forward, you can't understand"
(Er, what? I should make them less straightforward...so they are more understandable?)

Perception:
"not very helpful out of class"
(That's right: I'm not going to explain some theory when you run into me at West Edmonton Mall.)
"Your blog was very interesting & insightful"
(Thank you. That's a very interesting and insightful comment.)
"I missed a day and could not get the notes from the missed class"
(Did you ask me? All you have to do is ask me.)
"kept the class interested and attentive"
"tedious...class was very boring"
"repetitive...maybe try new ways of presenting information"
(What if you three were all trapped in an elevator for 41 hours?)
"I had no time to read [the textbook] since other psych courses also require textbook readings"
(OK, so it's the fault of those other courses. Those darn profs, making you read textbooks. Egad.)

Advanced Perception:
"would be nice to have a real textbook"
"readings were well chosen and definitely preferable to a textbook"
(There are no appropriate textbooks for a 300-level perception course. But that's OK, because you like the readings I chose.)
"quizzes were annoying...but in the end, I was thankful for them--it engages me & forced me to read [the assigned readings]"
"[quizzes] helped solidify my understanding of the main topics &...ensured I stayed up to date with the readings! It also taught me a useful study habit."
(See? Toldja.)
"I often left the class feeling as though he was talking down to us"
"was difficult to approach and was very short with me. I find him extremely snobby and condescending"
(I don't know how I screwed that up. I apologize, and I will honestly try to adjust my tone in the future.)
"forced me to work harder and think longer about the subjects covered" [in a positive context]
(Yeah, sorry about all that work and thinking.)

Why aren't you studying?

Find It