Saturday, 22 January 2005

Grade inflation

Leopold Stotch at OTB is indignant over Princeton’s new policy of capping the number of A’s at 35% of the class. I’m a little new to academia, but I’m not sure the policy is as objectionable as he seems to think.

Administrators are limited in their ability to judge the performance of professors. The problem—and the reason many administrators will default to grade distributions as a measure of grade inflation—is that educational outputs aren’t easily observable, whereas grades are easily observable. To make educational outputs observable, administrators have to overcome a great deal of uncertainty and cost. The attempts include taking student surveys—which, not surprisingly, are strongly correlated (positively) with grades—and observing professors in class. This last item is quite expensive and may just result in the professor being on his best behavior when the auditors are present.

One of the reasons that research universities use publication to make tenure decisions is that publication is easily observable, as is the quality of the publication (an ‘A’ journal, ‘B’ journal, and so forth). In any case, this is a topic that will be with us a while, if not forever.

Tuesday, 25 January 2005

Grade stagflation

Since before Robert’s post on this topic, I’ve been pondering grades in general, prompted by this post by Will Baude relating his experience at Yale, where he hasn’t yet “taken any classes that attempt to draw actual distinctions among the students.” Indeed, the very purpose of grading (as opposed to marking, as my Canadian colleague refers to it) is to discriminate among students on the basis of their academic performance. Thus all of the participants in the debate are right in their own ways, but I think they (individually, at least) miss the big picture.

Leopold Stotch and Steven Taylor both bemoan the administrative meddling in grade assignment inherent in Princeton’s decision, a sentiment with which (as a fellow political science professor) I must concur, lest we become like those emasculated law profs who not only no longer control their grades but also lack control over their own exam conditions. On the other hand, Nathan Novak thinks it’s a non-issue, due to the widespread use of class rank to compare students from different institutions; Andrew Samwick makes a similar point, although he acknowledges that grade inflation does lead to compression of the grade range. At the extreme end of the scale, the Grouch thinks grades don’t matter at all; I wouldn’t go that far, due to reasons of path dependence, but I can see his point—few people today care what grades I got in high school or as an undergraduate, but I wouldn’t be a professor today if I hadn’t gotten mostly A’s and B’s.

I think what Princeton is trying to do (rightly or wrongly) is address the “compression” problem that Andrew talks about—if 50% of the class are getting A’s, ny meaningful discrimination among those students has been eliminated; in other words, there’s been a loss of information in the process. If the purpose of grading is simply to drop passing students in buckets based on their absolute performance, giving 50% of students A’s might be appropriate; on the other hand, if the purpose of grading is to determine the relative merit of students, putting 50% of them in a single category isn’t very helpful.

The trouble is, we expect grades to do both of these things. The Millsaps college catalog, for example, requires all political science majors and minors to earn C’s in all of their coursework for the major (which leads to its own sort of compression effect, since effectively the minimum passing grade is raised from a 60 to a 73), and students must maintain a 2.0 GPA to participate in various and sundry extracurricular activities—the C and 2.0 represent absolute standards. But we also use grades to evaluate relative achievement, for election to honorary societies such as Phi Beta Kappa and for awarding other honors.

I don’t know that there’s a simple answer to these problems, although perhaps including measures of central tendency and dispersion along with assigned grades (as is at least partially the case at Dartmouth, according to Andrew) might be a good start.

Friday, 28 January 2005

More on grades

Orin Kerr discusses law school grading practices, including the notorious (and universal) use of strict curves, without as much overthinking as I engaged in earlier this week.

Monday, 7 February 2005

HOPE = grade inflation

Alex Tabarrok notes recent research suggesting that Georgia’s expensive HOPE scholarship program has done little to improve access for disadvantaged students to the state’s higher ed system, at the expense of producing rampant high school grade inflation and encouraging students to avoid challenging courses in college so they can keep their scholarships.

The best that can be said for the program is that it keeps talented students in-state, which may reduce the mobility of smart people away from Georgia; whether that’s sufficient to justify a massive middle class entitlement program (financed off the stupidity of the poor, in the form of lottery ticket sales) I leave as an exercise for the reader.

Thursday, 4 May 2006

Grade this

Steven Taylor makes a point about grading that I should nail to my office door, or at least my Blackboard announcements page. It simply amazes me how much grade-grubbing I get, and my ex-students will generally attest that I am not a tough grader to begin with, at least on above-average work—I’m still not quite sure how I landed in the toughest 40% of graders at Millsaps, but I doubt it was through any conscious effort on my part.

To give an example: my methods class essentially got 20% of their final grade gratis and the average final paper grade (worth 30%) was around an 87; even with a somewhat tougher set of midterm grades, the class average going into the final is just over 90%. Granted, I don’t expect the average to stay above 90 after the final, but nobody who did the work and made an honest effort is going to get out of the class with less than a B-.

Thursday, 4 October 2007

Dangerous curves

In response to an Orin Kerr post about a grade complaint lawsuit against the University of Massachusetts, Megan McArdle asks why professors use curves in the first place:

[W]hy do faculty, particularly at the undergraduate level where the task is mastery of a basic body of knowlege, set exams where the majority of the students can’t answer a majority of the questions? Or, conversely, as I’ve also seen happen, where the difference between an A and a C is a few points, because everyone scored in the high 90’s? Is figuring out what your students are likely to know really so hard for an experienced teacher?

I’ve spent a lot of time the last four years looking into psychometric theory as part of my research on measurement (you can read a very brief primer here, or my working paper here), so I think I can take a stab at an answer. Or, a new answer: I’ve blogged a little about grading before at the macro level; you might want to read that post first to see where I’m coming from here.

The fundamental problem in test development is to measure the student’s domain-specific knowledge, preferably about things covered in the course. We measure this knowledge using a series of indicators—responses to questions—which we hope will tap this knowledge. There is no way, except intuition, to know a priori how well these questions work; once we have given an exam, we can look at various statistics that indicate how well each question performs as a measure of knowledge, but the first time the question is used it’s pure guesswork. And, we don’t want to give identical exams multiple times, because fraternities and sororities on most campuses have giant vaults full of old exams.

So we are always having to mix in new questions, which may suck. If too many of the questions suck—if almost all of the students get them right or get them wrong, or the good students do no better than the poor students on them—we get an examination that has a very flat grade distribution for reasons other than “all the students have equal knowledge of the material.”

It turns out in psychometric theory that the best examinations have questions that all do a good job of distinguishing good from bad students (they have high “discrimination”) and have a variety of difficulty levels, ranging from easy to hard. Most examinations don’t have these properties; the people who write standardized tests like the SAT, ACT, and GRE spend lots of time and effort on these things and have thousands of exams to work with, and even they don’t achieve perfection—that’s why they don’t report “raw” scores on the exams, instead reporting “standardized” scores that make them comparable over time.

If you go beyond simple true/false and multiple choice tests, the problems become worse; grading essays can be a bit of a nightmare. Some people develop really detailed rubrics for them; my tendency is to grade fairly holistically, with a few self-set guidelines for how to treat common problems consistently (defined point ranges for issues like citation problems, grammar and style, and the like).

So, we curve and otherwise muck with the grade distribution to correct these problems. Generally speaking, after throwing out the “earned F” students (students who did not complete all of the assignments and flunked as a result), I tend to aim for an average “curved” grade in the low 80s and try to assign grades based on the best mapping between the standard 90–80-70–60 cutoffs and GPAs. It doesn’t always work out perfectly, but in the end the relative (within-class) and absolute grades seem to be about right.

Update: More on grading from Orin Kerr here.

Thursday, 19 February 2009

QotD, screw ever getting a teaching award edition

The entitlement society marches on:

“I think putting in a lot of effort should merit a high grade,” Mr. [Jason] Greenwood said. “What else is there really than the effort that you put in?”

“If you put in all the effort you have and get a C, what is the point?” he added. “If someone goes to every class and reads every chapter in the book and does everything the teacher asks of them and more, then they should be getting an A like their effort deserves. If your maximum effort can only be average in a teacher’s mind, then something is wrong.”

It will come to no surprise to any observer of contemporary collegiate culture that Mr. Greenwood is a kinesiology major, often a refuge for future gym teachers and meathead football coaches who think the education school’s curriculum is far too challenging. “Doing everything the teacher asks of [you]” isn’t A-worthy; doing everything the teacher asks of you better than most other people do it and achieving mastery thereof is A-worthy. And I say that as someone who has historically been a relatively lenient grader.

Bonus quote:

Sarah Kinn, a junior English major at the University of Vermont, agreed, saying, “I feel that if I do all of the readings and attend class regularly that I should be able to achieve a grade of at least a B.”

Via QandO, Critical Mass, Orin Kerr, and Jacob Levy, the latter of whom dissents in part.

Snark aside, I think “consumer demand” by students is a less compelling aspect of the problem—or at least the dimension of the problem I see at TAMIU, which is rather different than the dimension I observed teaching at selective private institutions—than the complicity of faculty and—particularly—administrators in encouraging faculty to reward students for occupying space and going through the motions in a misguided effort to retain students (and, perhaps more importantly, their associated free money from state and federal coffers—the marginal cost of student instruction is essentially zero from an administrative perspective) in college who have neither the interest nor actual need to complete a four-year degree.

My past thoughts on grading in general can be found here and here.