The oft-promised update of the cnlmisc
package for R is now posted. New in this release is a convenience method, sepplot
, that produces separation plots using the separationplot
package; this method works directly on model fit objects as a post-estimation call, and works with both binary and ordinal models at present. In addition, epcp
now works with clm2
objects from the ordinal
package.
Most of this was motivated by continued work on the economic voting paper, which has also been updated. cnlmisc
still has a long way to go before I submit it to CRAN, but at least it’s progress, right?
I finally have packaged up a very rough port of my epcp routine from Stata to R as part of a package unimaginatively called cnlmisc; you can download it here. In addition to the diagnostics that the Stata routine provides, the glm method includes a bunch of R-square-like measures from various sources (including Greene and Long).
The only part I’m sure works at the moment is the epcp for glm objects (including survey’s and Zelig’s wrappers thereof); the others that are coded (for polr and VGAM) are probably half-working or totally broken, and some wrappers aren’t there yet at all. The error bounds suggested by Herron aren’t there either. The print routines need a lot of work too; eventually it will have a nice toLatex() wrapper as well. But it beats having it sit on my hard drive gathering dust; plus I may eventually get motivated to write a JSS piece or something based on it.
epcp for Stata is still available at my site. For more information on the measure, see Michael C. Herron (1999), “Postestimation Uncertainty in Limited Dependent Variable Models” Political Analysis 8(1): 83–98 or Moshe Ben-Akiva and Steven Lerman (1985), Discrete Choice Analysis, MIT Press.
The bits of paper I’m hanging on the wall tomorrow (well, later today) at Midwest wherein I discuss “Geographic Data Visualization Using Open-Source Data and R” are now online here for the curious or insomnia-stricken.
From the description of the memisc
package for R:
One of the aims of this package is to make life easier for useRs who deal with survey data sets. It provides an infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) SPSS and Stata files. Further, it provides functionality to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates. Also some convenience tools for graphics, programming, and simulation are provided. [emphasis added]
How did I miss this package before? It makes analyzing NES data—heck, any data with value labels and missing values—in R an almost sane thing to do.
Reading CRANberries this morning I remembered that I’d never gotten around to packaging Amelia for Debian. So I dutifully filed my ITP and got to work on adapting the package to build with Debian, which thanks to Dirk’s hackery on R support in cdbs was pretty easy—copy over the debian
directory from my Zelig package, update the copyright file, fix up the control file, update the Debian changelog, fix a lintian
warning or two (FSF address in the copyright file), and it’s basically done.
Then I discovered that Amelia also throws in a couple of Tcl/Tk libraries. One, BWidget is already packaged, so all I had to do was delete the copy that’s installed by the Amelia package and add a dependency on it. The other is Combobox, the exact license of which follows:
completely, totally, free. I retain copyright but you are free to use the code however you see fit. Don’t be mean.
Yay. I get to play license negotiator again. I really love creating extra work I really don’t need for myself…
Andrew Gelman notes that the default graphics functions suck and that R has no real idea that all numbers aren’t conceptually signed floats. Gelman is told that the default graphics functions aren’t the ones we’re supposed to use these days (e.g. Trellis graphics a.k.a. lattice
and a bunch of stuff I’ve never heard of before today is preferable) and that R does have some idea that all numbers aren’t floats, but you have to convince R that the numbers you have aren’t floats, or something.
I think Gelman wins the argument by default.
The poster presentation today went moderately well, all things considered, and a few people indicated interest in seeing the completed paper in the near future. Compared to the other projects on my plate, that may be comparatively easy to do.
The only real extension I want to do for now is to tweak the R simex
package to allow the error variances for covariates to be different between observations; I also think I can cleanup the call syntax a bit to make it a bit more “R-like,” but that has less to do with the paper proper—except cleaning up the call syntax will make it easier to implement my tweak.
Since I have a lovely 6 am flight tomorrow, I’ve spent much of the afternoon packing and getting ready for the trip back to St. Louis; I’ll probably wander towards the closing reception in a little while, once everything’s close to organized for the morning.
For the R fans in the audience: Dirk Eddelbuettel announces CRANberries, a blog that automatically tracks new and updated packages/bundles in CRAN (the Comprehensive R Archive Network); CRANberries is also carried by the Planet R aggregator.
I also learned that you can combine your favorite RSS and Atom feeds with pictures of cats, although just why you'd want to do this is beyond my comprehension.
Oddly enough, the graphics
package code that I was using to add error bars to my dotcharts has mysteriously stopped working since upgrading to R 2.4.0. I can still make the dotcharts using dotchart
, but the error bars don’t show up after adding them using segments
. This clearly worked last month, or otherwise I wouldn’t have had a presentation to show at Mizzou.
Luckily enough I found another solution using dotplot
in lattice
instead in an article by Bill Jacoby in the most recent edition of The Political Methodologist… which I probably should have read before hacking together the code the first time around. So now it works… at least until R 2.5.0 comes out, at which point all bets are off.
The piece that Dirk and I wrote for The Political Methodologist on Quantian is now out in the Fall 2005 issue, along with a mostly-glowing review of Stata 9 by Neal Beck that no doubt will annoy the R purists, as he suggests he will be ditching R in favor of Stata in his graduate methods courses; a review of a new book on event-history analysis by Kwang Teo, whose apartment floor I once slept on in Nashville; and an interesting piece on doing 3-D graphics in R.
In other methods news, I had the privilege (along with a packed house) of hearing Andrew Gelman of Columbia speak this afternoon on his joint research on the relationship between vote choice and income in the states, which uses some fancy multi-level modeling stuff that I have yet to play much with.
Incidentally, it was fun to see someone else who uses latex-beamer
for their presentations; I could tell the typeface was the standard TeX sf
(sans-serif) face, but I wasn’t sure which beamer theme Andrew was using off-hand.
I have just wasted about two hours of my life trying to figure out how to make R draw a line graph (all I want to do is plot the conditional mean of a variable on the Y axis for certain categories of another variable) to stick in my undergraduate methods lecture for tomorrow—a graph I could have constructed trivially in Stata, Excel, or SPSS in about 15 seconds. This is patently ridiculous.
I am not an idiot; this should not be so hard to figure out. I like R, but it is actively user-hostile (even with Rcmdr and other packages loaded), and until it ceases to be such I will not foist it on my students.