Saturday, 19 May 2012

cnlmisc 0.2 for R

The oft-promised update of the cnlmisc package for R is now posted. New in this release is a convenience method, sepplot, that produces separation plots using the separationplot package; this method works directly on model fit objects as a post-estimation call, and works with both binary and ordinal models at present. In addition, epcp now works with clm2 objects from the ordinal package.

Most of this was motivated by continued work on the economic voting paper, which has also been updated. cnlmisc still has a long way to go before I submit it to CRAN, but at least it’s progress, right?

Monday, 12 April 2010

epcp for R

I finally have packaged up a very rough port of my epcp routine from Stata to R as part of a package unimaginatively called cnlmisc; you can download it here. In addition to the diagnostics that the Stata routine provides, the glm method includes a bunch of R-square-like measures from various sources (including Greene and Long).

The only part I’m sure works at the moment is the epcp for glm objects (including survey’s and Zelig’s wrappers thereof); the others that are coded (for polr and VGAM) are probably half-working or totally broken, and some wrappers aren’t there yet at all. The error bounds suggested by Herron aren’t there either. The print routines need a lot of work too; eventually it will have a nice toLatex() wrapper as well. But it beats having it sit on my hard drive gathering dust; plus I may eventually get motivated to write a JSS piece or something based on it.

epcp for Stata is still available at my site. For more information on the measure, see Michael C. Herron (1999), “Postestimation Uncertainty in Limited Dependent Variable Models” Political Analysis 8(1): 83–98 or Moshe Ben-Akiva and Steven Lerman (1985), Discrete Choice Analysis, MIT Press.

Friday, 8 January 2010

From the department of bad statistics

I’m glad to see that some things never change; in this case, it’s the low quality of local news reporting in Laredo. Pro8News breathlessly reports that Mexican drivers are ‘less likely to be ticketed’ since less than 25% of parking tickets in Laredo are issued to Mexican-licensed vehicles.

This story just begs to be placed on a research methods final as one of those “identify all of the problems with this analysis” questions. Bonus points for invoking Bayes’ theorem.

Wednesday, 19 August 2009

Extrapolate this

It took me two years to finally get back to it, but the paper from my 2007 PolMeth poster is reasonably close to done after throwing out virtually everything I did for the previous iteration of the research. Now on to the year-old projects.

Thursday, 2 April 2009

Posterized

The bits of paper I’m hanging on the wall tomorrow (well, later today) at Midwest wherein I discuss “Geographic Data Visualization Using Open-Source Data and R” are now online here for the curious or insomnia-stricken.

Thursday, 19 March 2009

How not to do data analysis

io9 presents a chart that purports to show that shark-jumping has an effect on television ratings. I’ll freely concede that Battlestar Galactica has had its, er, weaker moments, but the chart doesn’t actually show that creatively weak episodes had any effect whatsoever on the ratings that can be distinguished from the underlying, secular downward trend in ratings.

Since I had about 300 more important things to do, I decided to analyze the data myself. First, I reentered the ratings data from here into an OpenOffice.org spreadsheet and then identified the “shark-jump” episodes with a dummy variable, with the help of IMDB. I then created two new variables: a simple ratings difference variable for each episode, and a dummy variable to indicate whether or not an episode immediately followed an identified shark-jump.

I then converted to a CSV file, opened R, and estimated a linear regression: Delta = a + b(FollowShark). While the effect of an episode following a shark-jump was negative (about 0.025 ratings points), the effect was not statistically significant (p ≈ 0.736, two-tailed). Throwing out “Razor” and “The Passage,” to focus on episodes io9 says showed ratings losses improves the coefficient to about -0.042 ratings points, but it is still not significant (p ≈ 0.613, two-tailed).

So, the moral of the story: the episodes identified may have been “shark jumps,” but they didn’t seem to have a discernible effect on the ratings of subsequent episodes. And, besides, any analysis that doesn’t identify the crapfest known as “Black Market” as a shark-jumping incident isn’t worth taking seriously to begin with.

Bear in mind that TV ratings themselves leave something to be desired; variations of several tenths of a ratings point are within the expected margin of measurement error.

Thursday, 19 February 2009

Be still my heart

From the description of the memisc package for R:

One of the aims of this package is to make life easier for useRs who deal with survey data sets. It provides an infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) SPSS and Stata files. Further, it provides functionality to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates. Also some convenience tools for graphics, programming, and simulation are provided. [emphasis added]

How did I miss this package before? It makes analyzing NES data—heck, any data with value labels and missing values—in R an almost sane thing to do.

Saturday, 7 February 2009

Repurposed content

Herein I present a rant on one-tailed tests in the social sciences; feedback welcome:

Unless you have a directional hypothesis for every coefficient before your model ever makes contact with the data, you have no business doing a one-tailed statistical test. Besides if your hypotheses are solid and you have a decent n, the tailedness shouldn’t determine significance/lack thereof.

Thought experiment: assume you present a test in a paper that comes out p=.06, one-tailed. That means you have a hypothesis that doesn’t really work to begin with (sorry, “approaches conventional levels of statistical significance”). More importantly, if you just made up the tailedness hypothesis post facto to put a little dagger (or heaven forbid a star) next to the coefficient, you really did a two-tailed test with p=.12 and then post-hoc justified it to make the finding sound better than it really was.

Now here’s the center of the rant: I really don’t believe you actually knew the directionality of your hypothesis before you ran the test and were willing to stick with it through thick and thin, since I know that you’d be figuratively jumping up and down with excitement and report a significant result if the “sign was not as expected” and it came out p=.003 two-tailed (p=.0015 one-tailed, opposite directionality), rather than lamenting how it turned out with p=.9985 on your original one-tailed test. I dare say nobody has ever published an article claiming the latter (although I might give it a positive review just for kicks).

And I really don’t feel the need to have these discussions with sophomores and juniors, hence why I prefer books that just talk about two-tailed tests (aka “not Pollock” [a textbook I really like otherwise]) so I don’t feel the need to rant.

See also: this FAQ from UCLA, which is a little more lenient—but not much.

Thursday, 23 October 2008

Bust a move

I enjoyed last night’s episode of Mythbusters for a variety of reasons. For starters, I now have Smart Board Envy™, projectiles and explosions are always fun, and seeing Adam, Jamie, and Kari drunk was a hoot.

The social scientist in me, though, really enjoyed the “beer goggles” experiment. In fact, the show, edited down to just include that section, would make a great primer on “how social scientific experiments work” for my undergraduate methods course when I teach it again, presumably next fall. On the other hand, I was less thrilled with the “sobering up” experiment, but the comedy factor of drunken Adam trying to run on a treadmill without a handrail, with all-too-predictable results, made up for the scientific shortcomings therein.

Monday, 6 October 2008

Choosing the wrong denominator

Matthew Yglesias demonstrates innumeracy in action:

We got an interesting experiment this weekend as Bill Maher’s anti-religious screed Religulous and David Zucker’s right-wing satire An American Carol both opened. The two films were about tied in terms of total revenue but Carol was on three times as many screens, so basically Religulous was far more successful.

I think this mostly reflects something I wrote about a couple of years ago — the moviegoing audience is very demographically similar to the Democratic Party voting audience. It’s disproportionately young, disproportionately childless, and tilted toward residents of big cities and away from residents of rural communities. Conversely, the audience for television news is demographically very conservative (older, white, and a bit more prosperous than average) which is one major reason TV news coverage tilts right. The big screen audience for what looks like a witless screed against God is just a lot bigger than the big screen audience for what looks like a witless screed against Michael Moore.

Actually, since total revenue for both movies was about the same, it would appear that the “big screen audience” for crappy polemical Bill Maher movies is about the same as the “big screen audience” for crappy polemical David Zucker movies. Further, since I’d guess Carol probably played in theaters with lower ticket prices on average than Religulous, the former probably did a little better in terms of the total audience.

Yglesias also makes the rather faulty assumption that the per-screen average revenue would remain flat as screens increased. This result only obtains if moviegoers don’t select theaters based on what movies are playing at them or if screens are very distant from each other geographically; while surely there are some people who just go to the movies to see something without deciding beforehand which movie to see, I doubt there are enough of these people to ensure Maher’s movie would gain a substantially larger audience, except in the relatively uncommon cases where the film just isn’t on at any theater in a metropolitan area and there is a substantial number of people who want to see it.

Besides, all the discerning moviegoers were at Nick and Norah’s Infinite Playlist this weekend anyway.

Wednesday, 3 September 2008

Too many synonyms

I’m now up to five different terms for “independent variable” (X’s) in my methods lecture for today (thanks in part to a Polmeth post that just reminded me of one I’d forgotten to put on the list), and I’m probably missing some. You’d think we could narrow that down a little. The others: covariate, predictor, explanatory variable, and regressor.

Funnily enough, I only came up with “dependent variable” and “regressand” for Y, but I’m surely missing some there too.

Friday, 30 May 2008

Nothing is ever simple

Reading CRANberries this morning I remembered that I’d never gotten around to packaging Amelia for Debian. So I dutifully filed my ITP and got to work on adapting the package to build with Debian, which thanks to Dirk’s hackery on R support in cdbs was pretty easy—copy over the debian directory from my Zelig package, update the copyright file, fix up the control file, update the Debian changelog, fix a lintian warning or two (FSF address in the copyright file), and it’s basically done.

Then I discovered that Amelia also throws in a couple of Tcl/Tk libraries. One, BWidget is already packaged, so all I had to do was delete the copy that’s installed by the Amelia package and add a dependency on it. The other is Combobox, the exact license of which follows:

completely, totally, free. I retain copyright but you are free to use the code however you see fit. Don’t be mean.

Yay. I get to play license negotiator again. I really love creating extra work I really don’t need for myself…

Monday, 26 May 2008

The use and abuse of technology in the classroom

Michelle’s post‡ today on laptops in the classroom (in a similar vein to this article I read last month on the suggestion of Glenn Reynolds) reminded me that I had a few items from the past few weeks still in my Google Reader queue of “things to blog about” related to Margaret Soltan’s continuing crusade against the use of PowerPoint* and its ilk, and specifically Timothy Burke’s partial rebuttal:

What’s the difference between bad usage of PowerPoint in lectures and bad lectures that involve hand-outs, overhead transparencies and writing on the chalkboard? Are we just complaining about old wine in new bottles here? Is the real culprit professorial droning at classrooms of 200+ students followed by recite-repeat-and-forget examinations? I think it’s at least plausible that the technology is just giving us a new reason to pay attention to a pedagogy whose effectiveness has been suspect for two generations.

I dare say I’m among the last doctoral students who was “trained”—and I use that word loosely—to teach prior to the widespread use of PowerPoint. Four years of full-time in-classroom experience, mostly with small lectures and seminars, has brought me basically to agreement with Burke on this point—complaints about PowerPoint essentially boil down to complaints about either instructional laziness or the whole nature of lecturing, or as a Burke commenter puts it, ”[e]xactly how does one teach even 80† students at once without succumbing to passive data transfer?” The non-use of PowerPoint or some other form of instructional technology seems to me to be a luxury confined to those who only teach small seminars and graduate students, and while my personal career aspirations lean in that direction the reality is that I’m several years away (in terms of research productivity) from being there—if I ever get there.

Burke in his comments hits the nail on the head, I think, when it comes to any sort of visual presentation in class:

It seems to me that the absolutely key thing is to avoid speaking the slides literally. They’re best as definitions, key concepts, images: the kind of thing you’d stop your flow of lecturing to write on the chalkboard. They’re not the lecture itself.

I think there are three useful aspects to a lecture: what you put on the board (or slides), what you say, and the general outline. If you’re preparing a handout or something to stick on Blackboard for students, the outline or outline-plus-slides is what they need, along with space to fill in the gaps. An alternative approach is to make the slides/board material the outline; several of the more effective teachers I had (my high school history teacher and a political science professor at Rose-Hulman) took that approach. But you can’t shovel your script into PowerPoint and expect that to work well, any more than you’d expect that writing it up on the board, or for that matter reading a paper verbatim at a conference would be a good presentation, to work.

All this discussion leaves aside the question of teaching anything that involves symbols (chemistry, mathematics, statistics) which I think requires a different approach than bullet-points. In class, mathematics and statistics (and, by extension, social science research methods courses) lend themselves to a combination of “passive” PowerPoint-style presentation and more spontaneous problem-solving and brainstorming; for example, one of my early activities is to have the class try to operationalize (define in terms of a measurable quantity or quality) a concept like “globalization,” which you can’t really do with a static slideshow even though you can define terms like “operationalization” that way. Similarly, while you can step through the process of solving a problem in a slideshow I think it’s more effective to demonstrate how to step through the process on the board.

Unfortunately, many classrooms aren’t set up to allow you to present and use a board simultaneously; some of TAMIU‘s lecture halls have a nice design where the projection screen is above the board, so you can write on the board without having to do anything special with the slideshow, but rooms most places are designed for “either-or” which can be a real pain—fiddle with the control system to blank the screen, raise the screen, write on the board, then lower the screen, switch the screen back on. After a few iterations of that in a single class, you’ll never do it again.

I freely admit I haven’t figured everything out yet; my current methods slides are pretty good lecture notes but pretty rotten for projection. One of my projects for this summer (postponed from last summer after I learned I wouldn’t be teaching any methods courses this year) is to work on my research methods lectures to incorporate advice from Andrew Gelman’s book so I can lay the groundwork for my plot to take over the world effort to produce a workable, but rigorous, methods curriculum at both the undergraduate and master’s levels for political science, sociology, and (at the grad level) public administration.

More on this theme from Laura at 11D, who takes note of some of the more positive technological developments associated with academe. And, another of Burke’s commenters links this hilarious example of what not to do with your slides.

* I use “PowerPoint” as shorthand for the use of a computer-projector based slideshow-style sequential presentation of items associated with a lecture, a technique obviously made famous by the Microsoft software package but also available with many other software packages such as Apple’s Keynote, OpenOffice.org Impress, and several PDF viewers including Adobe Reader, xpdf, and GNOME‘s Evince.
† I’d put the cutoff significantly lower, at around 30–40 students. Beyond that point, one might as well just blow the cap off the class.
‡ By the way, it’s nice to see Michelle’s blog back from haïtus! (Where else would I keep up with current Mexican politics?)

Friday, 16 May 2008

Things that are icky about R

Andrew Gelman notes that the default graphics functions suck and that R has no real idea that all numbers aren’t conceptually signed floats. Gelman is told that the default graphics functions aren’t the ones we’re supposed to use these days (e.g. Trellis graphics a.k.a. lattice and a bunch of stuff I’ve never heard of before today is preferable) and that R does have some idea that all numbers aren’t floats, but you have to convince R that the numbers you have aren’t floats, or something.

I think Gelman wins the argument by default.

Friday, 25 January 2008

Math works

I had fun today in class with the following formula: 0.98/√N.

Wednesday, 23 January 2008

What I said

Timothy Burke articulates in a far better way an idea that Frequent Commenter Scott and I discussed (probably very loudly in a way so as to maximally annoy other patrons) at a bar in Chicago sometime last year, although my pitch was more for Mythbusters gone social science. Then again, maybe just using straight Mythbusters is a better idea; I doubt very many people would tune in to watch Kari Byron analyze data in SPSS.

Thursday, 4 October 2007

Dangerous curves

In response to an Orin Kerr post about a grade complaint lawsuit against the University of Massachusetts, Megan McArdle asks why professors use curves in the first place:

[W]hy do faculty, particularly at the undergraduate level where the task is mastery of a basic body of knowlege, set exams where the majority of the students can’t answer a majority of the questions? Or, conversely, as I’ve also seen happen, where the difference between an A and a C is a few points, because everyone scored in the high 90’s? Is figuring out what your students are likely to know really so hard for an experienced teacher?

I’ve spent a lot of time the last four years looking into psychometric theory as part of my research on measurement (you can read a very brief primer here, or my working paper here), so I think I can take a stab at an answer. Or, a new answer: I’ve blogged a little about grading before at the macro level; you might want to read that post first to see where I’m coming from here.

The fundamental problem in test development is to measure the student’s domain-specific knowledge, preferably about things covered in the course. We measure this knowledge using a series of indicators—responses to questions—which we hope will tap this knowledge. There is no way, except intuition, to know a priori how well these questions work; once we have given an exam, we can look at various statistics that indicate how well each question performs as a measure of knowledge, but the first time the question is used it’s pure guesswork. And, we don’t want to give identical exams multiple times, because fraternities and sororities on most campuses have giant vaults full of old exams.

So we are always having to mix in new questions, which may suck. If too many of the questions suck—if almost all of the students get them right or get them wrong, or the good students do no better than the poor students on them—we get an examination that has a very flat grade distribution for reasons other than “all the students have equal knowledge of the material.”

It turns out in psychometric theory that the best examinations have questions that all do a good job of distinguishing good from bad students (they have high “discrimination”) and have a variety of difficulty levels, ranging from easy to hard. Most examinations don’t have these properties; the people who write standardized tests like the SAT, ACT, and GRE spend lots of time and effort on these things and have thousands of exams to work with, and even they don’t achieve perfection—that’s why they don’t report “raw” scores on the exams, instead reporting “standardized” scores that make them comparable over time.

If you go beyond simple true/false and multiple choice tests, the problems become worse; grading essays can be a bit of a nightmare. Some people develop really detailed rubrics for them; my tendency is to grade fairly holistically, with a few self-set guidelines for how to treat common problems consistently (defined point ranges for issues like citation problems, grammar and style, and the like).

So, we curve and otherwise muck with the grade distribution to correct these problems. Generally speaking, after throwing out the “earned F” students (students who did not complete all of the assignments and flunked as a result), I tend to aim for an average “curved” grade in the low 80s and try to assign grades based on the best mapping between the standard 90–80-70–60 cutoffs and GPAs. It doesn’t always work out perfectly, but in the end the relative (within-class) and absolute grades seem to be about right.

Update: More on grading from Orin Kerr here.

Tuesday, 24 July 2007

PolMeth Postmortem

Michelle Dion has posted her thoughts on the recently-concluded political methodology conference at Penn State. I’ll echo her kudos to the organizers among the Penn State faculty and grad students, most notably Burt Monroe (who took time out to check in with the participants over the course of the meeting) and Suzie DeBoef. I also got some useful feedback and interest regarding the poster, which will be strong motivation to finish up the paper and get it out to the Working Papers archive and off to Political Analysis.

Like Michelle, I do wonder sometimes about the ability of the “core group” to reach out to the practitioners who don’t attend PolMeth and whose dues support the viability of the section and its journal. Notably, there has been some discussion of the section getting more actively involved in the Teaching Research Methods track at the APSA Teaching and Learning Conference, although I wonder if there is an awareness of what that track has done in the past on the part of the appointed committee (I’m pretty sure none of its members have been within 100 miles of a past TLC, and only one represents a non-research-oriented department), which may make for some interesting toe-trampling over the next few months.

My departure from State College was rather more eventful than one might have hoped; Northwest cancelled my 6:00 a.m. flight to Detroit and rebooked me on Delta via Atlanta, an airport which I’m pretty sure is foreseen somewhere in Dante’s works. As a special bonus I also got to enjoy the thrill and excitement of being SSSS'd by TSA. The good news is that at least I made it back in one piece.

Anyway, back to packing; Dad arrives tomorrow and I’d like it to look like I’ve made at least a modicum of progress here.

Saturday, 21 July 2007

Poster presented, time to pack

The poster presentation today went moderately well, all things considered, and a few people indicated interest in seeing the completed paper in the near future. Compared to the other projects on my plate, that may be comparatively easy to do.

The only real extension I want to do for now is to tweak the R simex package to allow the error variances for covariates to be different between observations; I also think I can cleanup the call syntax a bit to make it a bit more “R-like,” but that has less to do with the paper proper—except cleaning up the call syntax will make it easier to implement my tweak.

Since I have a lovely 6 am flight tomorrow, I’ve spent much of the afternoon packing and getting ready for the trip back to St. Louis; I’ll probably wander towards the closing reception in a little while, once everything’s close to organized for the morning.

Worthy announcements lost in feed space

For the R fans in the audience: Dirk Eddelbuettel announces CRANberries, a blog that automatically tracks new and updated packages/bundles in CRAN (the Comprehensive R Archive Network); CRANberries is also carried by the Planet R aggregator.

I also learned that you can combine your favorite RSS and Atom feeds with pictures of cats, although just why you'd want to do this is beyond my comprehension.

Monday, 16 July 2007

Poster done; time for sleep

Well, except for the “printing the poster” part, but I have a hookup for that.

It’s a little light on the pretty graphs and way too heavy on text, but I don’t think I had much to graph that would be worthwhile. And the text is important; or, at least, I think so, since I wrote it. And it’s probably halfway to being a paper, particularly once you put back in the stuff I commented out to get it to fit on a (really really big) page.

For my readers who won’t be in State College—or, for those who will and don’t feel like dropping by the faculty poster session—you can check it out here. It came out surprisingly well, considering that as of 48 hours ago I had approximately nothing after thinking I’d hit a brick wall.

The real geeks will be interested to know that this is the first time I used XeLaTeX, the fontspec package, and the sciposter documentclass. The body text is set in DejaVu Sans Condensed and fixed-pitch text is in Inconsolata, which are two of my favorite typefaces (and beat the hell out of the defaults, which were Helvetica and Courier).

Sunday, 15 July 2007

Fake this

I have determined that I am not very good at algorithmically generating fake roll call data. It may be time (tomorrow) for Plan B on the methods meeting poster, which will probably involve not doing anything with fake data but instead doing stuff with ideal point estimates with higher estimated error variance than ones derived from congressional roll calls.

Tuesday, 10 July 2007

Lead me not into temptation

From an email from the methods meeting cohost to poster presenters:

Please post your paper, if there is one, to the Society’s Working Paper Archive.

Must… resist… urge… to… only… analyze… data.

Monday, 2 July 2007

Partisanship and the DH rule

For my political scientist reader who thinks the DH rule is an abomination: Chris Zorn and Jeff Gill on partsianship and support for the designated hitter rule in baseball. Mind you, I can’t tell if their extended literature review is intended to be taken seriously or is a parody; the following sentence suggests the latter:

By allowing pitchers to avoid hitting, and some batters to avoid fielding, the DH rule is suggestive of a larger-scale decline in the culture of personal responsibility in America over the past several decades.

I look forward to similar contributions on Americans’ attitudes towards soccer and the relationship between individuals’ attitudes toward foreign aid and interest in hockey.

þ: Dan Drezner and Henry Farrell.

Tuesday, 26 June 2007

Methods meeting paper, day two

With a little Python scripting (to pull down a few constituency statistics for every House member from the online Almanac of American Politics) and about an hour of grunt work (coding the race and gender of 435 House members), I now have some covariates to play with for my methods meeting paper. About halfway through it occurred to me that I could have just recycled the 105th Congress data I compiled for the Damn Impeachment Paper™ and used it, but maybe this will be fresher and more interesting even though it’s all just illustrative anyway.

Next steps: hack together some R code to run some simulations and start thinking about how to get this down on paper, in both prose and poster format.