Dr Gay Bisanz taught me developmental psychology as an undergraduate. It wasn't a required course, but I took it anyway--I wanted to take as many psych courses as I could. It was a good choice. Sure, she taught me specific things about how people grow and develop, and general general things about science, psychology, critical thinking--but she also showed me the importance of giving back to your community. Sadly, Gay Bisanz died of cancer on June 1, 2005.
Gay started the Department of Psychology's now-famous Turkey Drive. People in the Department--academic staff, support staff, post-docs, students, and more--give money that is directed to CBC Edmonton's Turkey Drive for the Food Bank. But beyond that, some people bake cookies for sale, make jewelry, and donate items for raffle--there are a lot of very creative ways to part people with their money. Some instructors have volunteered to catch a pie with their face if their class contributes more money than any other class.
Last year, a total of $6,531.91 was collected. This year, the Turkey Drive goes from November 24 to December 8. Stop by the Psychology General Office (BS P-217) and buy a cookie, or a raffle ticket for one of the really nice gift baskets up for grabs. (The 2-for-$1 white chocolate and macadamia nut cookies are my personal weakness.)
Why aren't you studying?
The Gay Bisanz Memorial Turkey Drive
The Exam Statistics: The Q Score
This is my final post on the topic of exam statistics. Previously, I described my use of the mean, difficulty scores, and point-biserial correlation. This time: the dreaded Q score. (Just to clarify: I'm not describing the other "Q Score", which represents the public's familiarity with--and appeal of--a person, product, company, or television show. That's not dreaded at all.)
The dreaded Q score is not a statistic that I regularly receive with all my other exam stats. I have to put in a special request. It's extra work for the people over at TS&QS, which means there's an additional cost that must be paid by the Department of Psychology. TS&QS has to go into their database of exam results for my class and perform a statistical comparison between two (or more) given exams.
Here's where the dread comes in: Why would I want to statistically compare two (or more) students' exams? If I suspect them of cheating, that's why. Sometimes, the cheating is blatantly obvious. The cameras in the classrooms (you know about those, right?) may clearly show one person peering over at the exam of another. Other times, it's not so obvious. Why is that guy jittering in his seat, looking everywhere except at his exam? Maybe he's nervous, or has exam anxiety. Why is that girl acting squirrelly, shifting her eyes back and forth? Maybe she drank too much coffee, has caffeine overload, and now really, really has to pee. Whatever the case, the exam proctors will not interrupt any student taking the exam. Nope. We'll just let you do what you do. If that's cheating, so be it.
However, at the end of the exam, the answer sheets from any suspicious students are set aside. (Think you can fool us by not leaving at the same time, or handing in your exams to different proctors? Tsk. You don't know how many eyes are watching, do you?) Those answer sheets will be analyzed, and I will get the dreaded Q score. I don't want to say too much about how it works, so suffice it to say that it gives a probability that cheating has occurred, compared to chance. Maybe I'll write more about how I have to deal with cheating in another post, but for now, let's just say it involves a lot of dread.
Although I have caught several cheaters over the years, I'm glad I've never had to deal with anything like what happened in Professor Richard Quinn's class recently. Yeesh.
(Cartoon by Frank Cammuso. It's important to give credit where credit is due. Otherwise, it's like, um...cheating.)
Why aren't you studying?
The Mouse
That's right, a mouse. I've got a mouse in my office. Well, it's probably not here full-time, but it does drop in and visit. Speaking of dropping, that's what's in the photo: droppings. See the two little black sesame-seed looking things? Mouse poop. On my desk. (Describing it with a food metaphor is kinda making me queasy. Bleah.)
A couple of weeks ago, there was a knock at my door. A couple of jolly fellows were putting mouse traps in everyone's offices in the Psychology wing. I told them I hadn't seen any mice, but they were quick to point out two little black grains of rice (bleah) on the floor. They didn't clean up the poop.
At this point, the lightbulb went off in my head. Oh, yeah. The chocolate bar that I left on my side table the other day. I came in to find it half eaten. I wasn't pleased as I threw the remainder away (Swiss dark chocolate!)--I figured the cleaning staff had seen it an gotten a bit hungry. Nope. Those must've been mouse teeth marks.
OK, so now: mouse trap. The problem is that it hasn't been working. I come into my office in the morning and regularly find more poops. On my desk. Of course you know, this means war! I don't want to get hantavirus. So yesterday I went out and got a couple of better mousetraps, put some cheese in them (this is what cartoons have taught me: mice love cheese), turned out the lights and left for the day. Heh-heh-heh, I laughed menacingly.
Today I opened the door to my office hesitantly. What would I find? Answer: nothing. OK, not exactly nothing. No mouse. No cheese. Yup, the l'il sucker ripped off my cheese. But at least the mouse traps were still there. This means I'm now helpfully feeding the mouse that's running around on the second floor. Dr Snyder, whose office is just down the hall, recently saw it looking at him from his bookshelf, but he wasn't able to catch it. In my office, however, the mouse prefers my desk. Evidence? Another poop. Probably left right after polishing off those two bits of cheese. Can a mouse be impertinent?
I suppose there's some joke in here somewhere about a psychologist and a mouse, but I'm drawing a blank. Do you know any good ones?
Why aren't you studying?
The Exam Statistics: The Point Biserial Correlation
I'm continuing my explanation of the reams of statistics I get about multiple choice exams. Last time, I explained exam item difficulty scores. (Fascinating, no?) This time: point biserial correlation coefficient, or "rpb". That is, "r" for the correlation coefficient (why, oh why is it the letter r?) and "pb" to specify that it's the point biserial and not some other kind of correlation. Like, um, some other kind.
If I've constructed a good exam item, it should be neither too hard nor too easy. It should also differentiate among students. But I can't tell how well it does that just by looking at the difficulty score. Instead, there's a more complex measure, the rpb. In general, I need a correlation index for a categorical variable with a continuous variable. More specifically, I want to correlate the categorical variable of a test item (i.e., whether a student answered the test item correctly or incorrectly), with the continuous variable of the student's percent score on the examination. Got that? I didn't think so.
Let me try again. Student A did well on the exam, getting 90% correct. Student B did not do so well, getting only 50%. If I look at any given exam question, in general, student A should be more likely to answer it correctly than student B. This is not the same as difficulty, because I'm not simply looking at what proportion of the class answered the question correctly. I'm correlating each student's score with their performance on each question. The key to all this is the word "should" in the sentence above.
If an exam item is poorly constructed for whatever reason, good students may do worse on it than students who did worse on the exam. That is, the better you are overall, the less likely you are to answer it correctly. That is not supposed to happen. The rpb gives me this information for each question on the exam. Experts in exam construction recommend that the rpb should range from 0.30 to 1.00. Anything question getting a rpb lower than 0.30 means that I will take a look at it and try to figure out why that's happening.
And if the rpb is negative, well...it's a negative correlation. That's the worst case I described: better students are doing worse answering this question; and poorer students are doing well. I won't use any questions getting a negative rpb again unless I can figure out why it's happening. Maybe I can tweak the question, maybe I have to rewrite it to ask about the same knowledge in a different way. Or maybe I'll just give up entirely, go and get a coffee, and check out some LOLcats.
Why aren't you studying?
The Exam Statistics: The Difficulty
In my last post, I discussed how I analyze the mean in my (multiple choice) exams. This time, I'm going to look at difficulty. This is not directly related to the mean. Huh? Isn't it the case that, the more difficult the exam, the lower the mean? Well, yes. But that's not the "difficulty" I'm writing about.
Among all the pages and pages of results I get from Test Scoring & Questionnaire Services is the "DIF" score or difficulty of each question. It's actually the proportion of the class who answered that question correctly. DIF=1.000 means that everyone got it right, but DIF=0.250 means that only 25% of the class did. But it's not really "difficulty," is it? If a question is really difficult, fewer people will answer it correctly and the number should decrease. So, really, it shouldn't be called difficulty, it should be called easiness. But, look, it's just called "difficulty," OK?
You might be thinking that I want everyone to answer every question correctly, right? Um, sorry to rain on your ice cream, but...no. It's really, really unlikely that everyone was able to learn absolutely everything in the course, and was also able to remember and apply that knowledge on an exam perfectly correctly for every question. What an exam should do is assess each student's learning of the material, and provide some way of differentiating among all students. If all questions are answered correctly, the exam itself has failed.
I went to a seminar last year at which a renowned expert in testing and exam question construction gave a talk. After it was over, I talked to him about DIF scores--specifically, what should they be? The general rule is that an exam question is doing a good job of differentiating among students if it's at least 0.300. That is, at least 30% of the class should be getting each question correct. There is no guideline for the upper end, but at another seminar, I heard an instructor say that she liked to put at least one DIF=1.000 question on each exam as a confidence booster. Yup, a gimme. I thought that was a pretty nice thing to do, so I try to include at least one high DIF question on every one of my exams, too.
So difficulty is related to the mean in that, the higher the DIF, the higher the mean on the exam overall. The mean is good for evaluating the overall performance of the class. But I also need to evaluate the questions on my exams, so I get the DIF score for each one. If the DIF is too low, the question either gets killed (*snff*), or rewritten to clarify it. Oh, and if I ever get DIF=0.000, it means I've keyed in an incorrect answer. Ooops.
Why aren't you studying?
The Exam Statistics: The Mean
With the first round of (multiple-choice) midterms over, I'm now swimming in data. I want to tell you about some of the stats I go through to assess and improve my exams. Unfortunately, I'm too late to celebrate (the first) World Statistics Day. But I don't feel too bad. At least statistics has a day. It's not like there's a "Psychology Month" or anything. Oh, look--yes there is. And I'm late for that, too. Moving on...
This installment is about the (arithmetic) mean, or, if you insist, the "average." I post the class mean of every exam because you demanded it! Really, though--what use is it to you? For classes that don't grade on the curve, you don't need to know the mean (or standard deviation) to determine your absolute standing in the class. Just take your percentage correct, and see what grade that corresponds to in the syllabus. Right?
Yes, that's important. But don't you want to know how everyone else did, too? Sure you do. "Did everyone think that exam was a killer, or just me?" We want to compare ourselves to other people. Some students even want to know what the top score was. "Did anyone get 100%?" "Am I the best in the class?"
The mean also serves another purpose, when there are multiple forms of an exam. In larger classes, multiple forms of an exam are used to discourage cheating (or at least, to make it more difficult). Typically, there is one form that has the questions arranged in order of topics (e.g., questions based on the first lecture and textbook chapter first, followed by questions on the second lecture and chapter, etc.). The other forms will have the questions in a random order. Are students who get the scrambled forms at a disadvantage? Or, put another way, is there a benefit to answering questions in a sequence that reflects the arrangement of the learning materials? If so, that wouldn't be fair, would it?
The data from every exam includes the means from each form. They are usually a little bit different. But is that difference a fluke, or is it due to the ordering of questions? Hmm, sound like a job for...statistics! The data also includes the results of an ANOVA (analysis of variance) that compares the means to each other. That is, are any differences statistically significant? The answer: No. I've never had a difference at p < 0.01 or even p < 0.05. That means any differences are small; they are due to chance.
The bottom line: It doesn't matter which form you get. Isn't science cool?
Why aren't you studying?
The Coffee
When I went to high school, it wasn't cool to drink coffee. Coffee was dark, scary, and bitter. Sure, my family would have Kaffeezeit ("coffee time") on the weekends, but I was just in it for the Kuchen.
When I started university, I drank a cup of tea with milk and sugar every single morning. Even though I had a lot of 8:00 classes (because they were good classes only offered at that time, that's why), one cup of English breakfast tea was all the caffeine I needed. Some of my friends became desperate around exam time, and dipped into the go-juice. It was hilarious to watch as the normally non-caffeine consumers' eyes got really big after having a big cuppa joe. Then, they'd study like the dickens. This proved that coffee was a dangerous, dangerous substance.
Then, I started graduate school. Sure, getting a graduate degree is pretty demanding. Maybe I'd have an extra cup of tea once in a while. But the sheer, stark terror of almost having to go into the Real World was enough stimulus for me--no coffee, thanks. Maybe just a bit more sugar.
Then, one term I was Dr. Dawson's teaching assistant. He made me come to the class (I dunno, to learn something I guess); because he was on my supervisory committee, it's not like I could say no or anything. The first day of class, I met him at his office and we went to the class together. But not before he poured himself a cup of freshly made French-pressed coffee. And then he insisted on giving me a cup. It's not like I could say no or anything. The worst thing was that the coffee was: black. As black as night. No milk, and certainly no sugar (isn't that freebasing?). After that, I brought my own mug and poured sugar into it first--milk was too conspicuous.
Now, I'm neither a connoisseur (yes, I drink instant--please forgive me), nor a coffee-hound. Usually, I get by with only one cup of coffee. But if I've had a bad sleep (which does happen), you'll see me toting a cup from one of the fine local purveyors. Not my usual mug-o-water; not tea; not vodka. I have gone over to the dark side. With lots of milk and sugar.
Why aren't you studying?
Find It
About Me
- Karsten A. Loepelmann
- Edmonton, Alberta, Canada
- Faculty Lecturer in Psychology at the University of Alberta
Category
- awards (27)
- behind-the-scenes (198)
- exam prep (13)
- exams (13)
- learning (22)
- miscellaneous (176)
- research (8)
- studying (14)
- teaching (108)